Journal of Applied Psychology 


b : Edited by 738 @ pi 
my Donald G. Paterson jj {... pean 


University of Minnesota 


‘ 


Consulting Editors 


GEORGE K, BENNETT, Psychological Corporation; WALTER V. BINGHAM, Washington, D. C.; HAROLD 
E. Burtt, Ohio State University; ALLEN L. Epwarps, University of Washington; IRVING LORGE, T, 
C. Columbia University; QUINN MCNEMAR, Stanford University; James P. PORTER, Danville, Ilinois; 
ULIAN B. ROTTER, Ohio State University; EDWARD K. STRONG, JR., Stanford University; DONALD 
. SUPER, T. C. Columbia University; Morris S. VrreLes, University of Pennsylvania; ALFRED C. 
WELCH, Knox-Reeves, Minneapolis. 


Volume 33, 1949 


Prince and Lemon Sts., Lancaster, Pa., and 1515 Massachusetts Ave., NW, Washington 5, D. C. 


red as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879 


| Acceptance for mailing at the special rate of postage provided for in the Act of February 28, 1925, 
embodied in paragraph 4, Section 538, P. L. & R., authorized October 10, 1947 


Copyright, 1949, by The American Psychological Association, Inc. 


Contents of Volume 33 


Articles, 
Aldrich, M. G. A Follow-up Study of Social Guidance at the Col- 
T n E A E E E a A EAE E E 258 
Anderson, R. G. Reported and Demonstrated Values of Vocational 
Counc one A Eaa eri oo A sari > ips EAA iets nine IEKE DIAAN 460 
Angoff, W. H. An Empirical Approach to a Problem of Psychophysi- 


aS o A ue e E E a E EA 
Barnett, A. A Note on Mechanical Aptitude of West Texans 
Bass, B. M. An Analysis of the Leaderless Group Discussion. ... 527 
Browne, C. G. Study of Executive Leadership in Business. I. 
PONG Wy Ay AME Ga DEST: CT PETTA O T OT 521 
Carrington, D. H. Note on the Cardall Practical Judgment Test.. 29 
Chesler, D. J. Abbreviated Job Eyaluation Scales Developed on the 
Basis of “Internal” and “External” Criteria.................. 151 | 
Clark, K. E. A Vocational Interest Test at the Skilled Trades = ~” 
TQ Velie A n aa ANA a E hess. woe NATE E AS pit 3,4 in 291 
Daniels, E. E. and Hunter, W. A. MMPI Poraonalitr Patterns for 
VANNE OCCUDPAHOUS. SIA se oaa ENE chins 0 nec aeia T ehi eah ope corn 559 
DiMichael, S. G. Work Satisfaction arid Work Efficiency of Voca- 
tional Counselors as Related to Measured Interests............ 319 
Donceel, J. F., Alimena, B. S. and Birch, C. M. Influence of Pres- 
tige Suggestion on the Answers of a Personality Inventory....... 352 
Dougan, C. P., Schiff, E. and Welch, L. Originality Ratings of De- 
partment Store Display Department Personnel............... 
Edwards, A. S. Attention and Involuntary Movement 
Elinson, J. Attitude Research in the Army...................55 
Farr, J. N. and Jenkins, J. J. Tables for Use with the Flesch Read- 


ADUMY o g E LT ee ee een ce ielae ee EA E 275 
Fehrer, E. and Strupp, H. The Effect of Equating Interest Test 
Moms for Prepac VAG: teat curr a4 te T E E, 222 
Feronte, N. C. Tests Used by United States Air Carriers........ 445 
` Ford, A. Types of Errors in Location Judgments on Scaled Sur- 
faces. I. Errors of Configuration..................0.-.0200 373 
-Ford, A. Types of Errors in Location Judgments on Scaled Sur- 
faces. II. Random and Systematic Errors..............++5+5 382 


—- Ghiselli, E. E. and Brown, C. W. The Prediction of Accidents of 
Meee Taxicab: Drivers N ae eaae e NA He daU Bitte 540 


iv Contents of Volume 33 


Giese, W. J. and Ruter, H. W. An Objective Analysis of Morale.... 421 


Glanz, E. A Trade Test for Power Sewing Machine Operators..... 436 
Gordon, T. The Airline Pilot’s Job...............eeee eee sere 122 


Greene, J. E., Osborne, R. T. and Sanders, W. B. A Window- 
Stencil Method for Scoring the Strong Vocational Interest Blank 
WIDOT Yate Mask te CREAN Lin) lore Gisieielelngugis eben ee ees 141 

Grether, W. F. Instrument Reading. I. The Design of Long- 
Scale Indicators for Speed and Accuracy of Quantitative Read- 
DHE Ey Samra a el Ne A AN E OEE EEEE 363 

Grether, W. F. and Williams, A. C., Jr. Psychological Factors in $ 
Instrument Reading. II. The Accuracy of Pointer Position 
Interpolation as a Function of the Distance between Scale Marks 


RRO EAOa a E, DEES a AAA va tals Ae a s+ 594 
Hadley, J. M. and Kahn, D. F. A Comment on Wallace’s Note on 

“Factors Related to Life Insurance Selling”.................. 359 
Hake, D. T. and Ruedisili, C. H. Predicting Subject Grades of 

Liberal Arts Freshmen with the Kuder Preference Record....... 553 
Harrell, T. W., Brown, D. E. and Schramm, W. Memory in Radio 

MAAARI IoT Serpe te Wie a AAAH Asa PAREEN AAR 265 

„Harris, F. J. The Quantification of an Industrial Employee Survey. 
EEI eva AEEA e A ANA E E ON 103 
Harris, F. J. The Quantification of an Industrial Employee Survey. 

PAD DMCA IOE EA i E TASEEN IA T o o 112 
Holdrege, F. E., Jr. Implementing an Employee Opinion Survey.. 428 
Jaspen, N. A Factor Study of Worker Characteristics,.......... 449 
Jenkins, W. L. and Connor, M. B. Some Design Factors in Making 

Settings on a Linear Scale... 2.0... ec cece cece naa 395 
Jurgensen, C. E. A Fallacy in the Use of Median Scale Values in 

Muiployeo/Check Liste obi ul ye ee Ro 56 
Kahn, D. F. and Hadley, J. M. Factors Related to Life Insurance 

PIL Morn ec SMM INULIN WON NCC Atel Mag y 132 
Karn, H. W. Performance on the File-Remmers Test, How Super- 

vise? Before and After a Course in PSY CHOMORY sje ace wh. ss 534 
Katz, D. An Analysis of the 1948 Polling Predictions........... 15 
Kerr, W. A. and Martin, H. L. Prediction of Job Success from the 

Appuca non Blanes A E SAA n e S re T O 442 
Kirchheimer, B. A., Axelrod, D. W. and Hickerson, G. X., Jr. An 

Objective Evaluation of Counseling....................00.05 249 
Kirkpatrick, J. J. and Cureton, E. E. Vocabulary Item Difficulty 

and (Word: Hrequencycp vases Lae iat oo Gay ay Mn 347 
Knauft, E. B. A Selection Battery for Bake Shop Managers...... 304 
Kriedt, P. H. Vocational Interests of Psychologists............. 482 


EEREN SA S eee 


Contents of Volume 33 v 


Kriedt, P. H. and Clark, K. E. “Item Analysis” Versus “Scale - 
VANGIVRIS AI ce E EE S O BO rele ceo ace siden aE PIE E Aa BANETA 114 
Kuntz, J. E. and Sleight, R. B. Effect of Target Brightness on 
“Normal” and “Subnormal” Visual Acuity.................-5 83 
Lawshe, C. H. and Farbro, P. C. Studies in Job Evaluation: 8. 
The Reliability of an Abbreviated Job Evaluation System........ 158 
Lawshe, C. H., Kephart, N. C. and McCormick, E. J. The Paired 
Comparison Technique for Rating Performance of Industrial 


FETHDIGYGESH esate nett A Er a tare 4. cies shite eee orale ona ey eres Rea 69 
Levine, A. S. Correcting Special Ability Test Scores for General 
FAD Ronit ramon EA E rier, Cl 24 Ae Fa EE TENi 566 
Link, H. C. and Freiberg, A. D. The Psychological Barometer on 
Communism, Americanism and Socialism.................... 6 
Locke, B. and Grimm, C. H. Odor Selection, Preferences and 
Identification................ PASTE AU TALS ip AE A PAEAN 167 
Longstaff, H. P. and Laybourn, G. P. What Do Readership Studies 
AONO co ai aie S oe ao Ga nr oR ie aa ot RAM ibe 585 
Lyman, H. B. Flesch Count and Readership of Articles in a Mid- 
POSTER UE GTI PADOT Ei SIENET AET OU ala AE ARNAR SA 78 
McCandless, B. R. The Rorschach as a Predictor of Academic 4 GE 
r PAE AAA AA a i AAA E a y E A o A shy Ug 43 
Mintz, A. and Blum, M. L. A Re-examination of the Accident 
PrOnenOss Concepts eaS teas Vais A R i AEV 195 
Mosier, M. F. and Kuder, G. F. Personal Preference Differences 
Among Occupational Groups... 2.0.6.6 .6 00.6 b cece cence eee 231 
Ostrom, S. R. The OL Key of the Strong Test and Drive at the 
Swell Grade Level eeu EE TAA N R A UNIS 240 | 
Ostrom, S. R. The OL Key of the Strong Vocational Interest Blank 
for Men and Scholastic Success at College Freshmen Level... ... 51 
Otis, J. L. and Chesler, D. J. A Short Test of Mental Ability....... 146 


Perloff, E. Prediction of Female Readership of Magazine Articles. . 175 
Pronko, N. H. and Bowles, J. W., Jr. Identification of Cola Bever- 


Ages. LIMA FINALS AYE E EAR isle sie have ate e ba AA 605 
Rieger, A.F. The Rorschach Test and Occupational Personalities... 572 
Rieger, A.F. The Rorschach Test in Industrial Selection.......... 569 
Satter, G. A. Method of Paired Comparisons and a Specification 

Scoring Key in the Evaluation of Jobs.................-..+5. 212 
Seashore, R. H., Dudek, F. J. and Holtzman, W. A Factorial 

Analysis of Arm-Hand Precision Tests.....................- 579 
Shaffer, R. H. Kuder Interest Patterns of University Business 

Menool Seniors). os serene re ood Sy ee 489 


Sherriffs, A. C. Modification of Academic Performance Through 
Personal Interview... Ssscevsnsatigc. sce eee te ES Aee 339 


vi Contents of Volume 33 


Sinaiko, H. W. The Rosenzweig Picture-Frustration Study in the 


Selection of Department Store Section Managers............. 36 
Strong, E. K., Jr. Vocational Interests of Accountants........... 474 
Thorndike, E. L. Note on the Shifts of Interest with Age........ 55 
Tiffin, J., Parker, B. T. and Habersat, R. W. Visual Performance 

OU Accident RECHNONOM tute cen en an E EAE Eei cee ss 499 
Tinker, M. A. and Paterson, D. G. Speed of Reading Nine Point 

Type in Relation to Line Width and Leading................. 81 
Turner, W. D. Some Precautions in the Use of the Per Cent 

Method of Job Evaluation... ...........0 cece cece cece eens 547 
Wallace, S. R., Jr. A Note on Kahn and Hadley’s “Factors Related 

to LitecInanrancel selling’ a n dds A a she ane « 356 
Whitlock, J. B. and Crannell,C. W. An Analysis of Certain Factors 

in Serious Accidents in a Large Steel Plant.................. 494 
Wittenborn, J. R. Certain Rorschach Response Categories and 

PRCA DTI TBS IE STE cise. ohh 202) ea bts Rela: oye o « 330 


Book Reviews 
Achilles’ Management and the Psychologist: A Practical Guide on 


~\... Psychology for the Business Executive: Donald G. Paterson... .. 187 
Ahérn’s Survey of Personnel Practices in Unionized Offices: C. E. 
MUIPONSEN eee wa nah MUR Al a Ud Mee 187 


Bowler and Dawson’s Counseling Employees: C. E. Jurgensen 279 
Buros’ The Third Mental Measurements Yearbook: E. Donald 


RISA OM NAR A aU MAM ao Uy Ca aogais a 181 
Burtt’s Applied Psychology: Steuart Henderson Britt............ 510 
Chapin’s Experimental Designs in Sociological Research: Donald E. 

PUDRO UL aS ie NR eT aOR E AAEE Aa, os. 93 
Clarke’s The Application of Measurement to Health and Physical 

ptiducation: iH McCloys. clin e E iwcccuvesal....... 98 
Darley’s The Use of Tests in College and Froechlich and Benson’s 

Guidance Testing: Milton E. Hahn......................... 96 
deFord’s Psychologist Unretired, the Life Pattern of Lillian Martin: 

(Wiesel) RE EAA OE A ENE 612 
Doob’s Public Opinion and Propaganda: Alfred C. Welch......... 284 
Erickson’s A Basic Text for Guidance Workers: William A. Mc- 

EEN yo REALT G EESE ASES a EA A 97 
Escalona’s An Application of the -Level of Aspiration Experiment to 

the Study of Personality: Julian B. Rotter................... 515 
Evans’ An Introduction to Color: Miles A. Tinker............... 416 
A ape Dimensions of Personality: Arthur Weider............. 614 

oldstein’s The Roots of Prejudice Against the Negro in the United 

States: Allen L. Edwards.................. a E ba y . ie 516 


Contents of Volume 33 Vii 


Jucius’ Personnel Management: Albert S. Thompson............. “414 
Kaback’s Vocational Personalities: An Application of the Rorschach 

Group Method: Boyd McCandless. .........-..-.:.0.e:e000+ 612 
Kaufmann’s Your Job: William A. McClelland.................. 517 
Kessler’s Rehabilitation of the Physically Handicapped: Donald 

rI oari a Aoo ety ico: <i\eie. A ANARE E 279 
Lall’s Mental Measurement: Henry E. Garrett................+- 415 
Lawshe’s Principles of Personnel Testing: Edwin E. Ghiselli....... 92 
Lewin’s Resolving Social Conflicts: Horace B. English............ 410 
Linebarger’s Psychological Warfare: Clark L. Hosmer............ 187 
OSS Assessment Staff’s Assessment of Men: Selection of Personnel 

for the Office of Strategic Services: Donald E. Super.......... 511 
Pigors and Myers’ Personnel Administration: A Point of View and ` 

a Methods Albert S: Thompson... s. eseat aee po onien a an ote 282 
Planty, McCord and Efferson’s Training Employees and Managers 

for Production and Teamwork: Clifford E. Jurgensen.......... 611 


Ross’s Measurement in Today’s Schools and Ross’s Chapter Exer- 
cises and Tests to Accompany Measurement in Today’s Schools: 


AW tons Wien OOM TOE E AS ES r e AEA E Hath 99, 
Rudolph’s Attention and Interest Factors in Advertising: Howard e oe 
WOU RSbALC a eels aas ean Aea AE estas atest ely oo 286 

Selekman’s Labor Relations and Human Relations: Brent Baxter.... 287 


Stouffer, et al. The American Soldier: Volume I, Adjustment 
During Army Life; Volume II, Combat and its Aftermath: 


Alena ADANWAN AEE cara O Aaa volora oral Ure ao alae E aaa eaten 609 
Terman and Oden’s The Gifted Child Grows Up: Twenty-five Years’ 
Follow-up of a Superior Group: Sidney L. Pressey............ 189 
Yoder’s Personnel Management and Industrial Relations: Albert 8. 
PERRIER Ea slats aie ised yA CS fice eon SNEAK PAO 280 
Yoder, Paterson, et al. Local Labor Market Research: Arthur H. 
Bray HOLce ene ye Mecano AEE E Ae Aa EE a E EE EAA 411 
Miscellaneous 


New Books, Monographs, and Pamphlets. 101, 191, 289, 418, 519, 615 
IARAA ART AS e A A E E A A A 100 


ae 
Journal of Applied Psychology 


Vol. 33, No. 1 February, 1949 


Attitude Research in the Army * 


Jack Elinson 


Troop Attitude Research Branch, Troop Information and Education 
Division, Special Staff, United States Army 


Many Army policies affecting troops depend on soldier reactions and 
cooperation for success. Necessarily then, those formulating policies need 
to know soldier reactions. Obviously, the larger an organization is, the 
harder it is for top management to keep in touch with what the troops 
(or employees) are thinking. Attitude surveys and opinion polls of 


soldiers are a carefully developed means of helping higher headquarters 4, 


keep well-informed on these matters—as well informed as a good compaay 
commander or good supervisor can be as a result of getting around in his 
company or unit and talking and working with his men. Such surveys 
can determine: 


In case of an existing policy .. . 
. do men know about it and understand it? 


. are they in sympathy with it? 
. do they feel it is being carried out as intended? 


In case of a proposed policy . . . 
. what is likely to be its effect? 
. how are men likely to react to it? 


Research on troop opinion helps provide answers to such questions. 
Broadly speaking, attitude research functions for Army administra- 
tion in the following five ways: 


1. As a means of anticipating troop reaction to a new administrative 


policy. 
* This article was originally prepared as an administrative memorandum for the use 
~ of Lieutenant General Willard S. Paul, Director, Personnel and Administrative General 
Staff, USA. As such it represents the thinking of Major Paul D. Guernsey, Chief, 
Troop Attitude Research Branch, and Ira H. Cisin, Sr. Analyst in Charge of Unit 
Studies, as well as that of the writer, who is Sr. Analyst in Charge of Surveys. 


1 


2 


2 Jack Elinson 


2. Asa guide in the formulation of administrative policy or of a change 
in policy. i 

3, As a means of evaluating the operation of an existing administra- 
tive program. 

4, As a means of evaluating, experimentally, the effectiveness of an 
information or training program. 

5. As a source of quantitative information and evidence in support 
of or against a proposed policy or change in policy. 


Anticipating Troop Reaction 


1. Troop Attitude Toward Army’s New Career Guidance Program. The 
Army’s new Career Guidance Program for enlisted personnel involves 
the establishment of a systematic promotion ladder within each type of 
service. Advancement up the ladder will be based essentially on Army- 
wide competitive testing. It was planned that the program would first 
go into effect for men in the Infantry. Before all the details of the 
Career Guidance Program had been decided upon, an attitude study was 
conducted among Infantrymen in order to get a preview of their reactions 
to the plan. 

The study indicated that: although the new Career Program was 
acceptable to enlisted men in principle, many enlisted men were in 
opposition to some of the details of the proposed program. In addition 
to revealing attitudes of men toward the various phases of the Career 
Program, the survey also disclosed areas of ignorance about the Career 
Program. So that, while attitudes toward the Career Program may be 
difficult to change and some alteration in the administrative details of 
the program may appear necessary, areas of ignorance about the program 
may be skillfully attacked with well-directed informational activity. 


Formulation of Administrative Policy 


2. Troop Aititude on Order of Demobilization. Months in advance of 
VE-Day, the War Department’s Special Planning Division was anticipa- 
ting the likelihood that demobilization policy adopted on defeat of 
Germany could result in morale disaster if the plan adopted were to be 
far out of line with what troops would consider fair. 

The problem then was to determine accurately what plan troops ' 
would be likely to consider fair. Troop cross sections were surveyed by 
research teams in the United States and in overseas theaters as early as 
November 1943 and several times subsequently. 

Research revealed four factors to be critical in soldiers’ minds: (1) 


Jength of service; (2) time overseas; (3) parenthood; and (4) combat 
participation. 


Attitude Research in the Army 3 


These were the four basic factors adopted in the Adjusted Service 
Rating Plan (Point Score for demobilization). 

One can read today in the book, just published, written by the 
Historical Division, Department of the Army, entitled The Army Ground 
Forces in World War II, how considerations of military necessity as 
well as troop attitudes dovetailed into final determination of admin- 
istrative policy. 


Evaluating an Existing Program 


3a. Trend Surveys in the Universal Military Training Experimental 
Unit. When the Universal Military Training Experimental Unit was 
set up at Fort Knox, Kentucky, the program included various innova- 
tions in military procedure: Code of Conduct (a form of demerit system), 
Trainee Courts, with men themselves sitting somewhat as a jury, con- 
siderable emphasis on the Chaplains’ activities, compulsory educational 
program, concerted attention to off-duty activities, etc. In order to 
measure the trend of trainee reaction to these innovations, and to the 
training program in general, attitude studies have been made among the 
trainees in each of the cycles going through the unit,—studies conducted 
at the beginning, the middle and the end of each training cycle. Phe 
attitudes of the officers and cadre of the unit have also been obtained at 
the end of each training cycle. From the reports on these studies the 
Commanding General of Army Ground Forces and the Commanding 
General of the unit have followed any shift in reactions as trainees 
progressed through their training. As modifications are made in the 
experimental training program at the unit, the studies are re-designed 
to evaluate the results. 

3b. Studies Pertaining to Recruitment for the Military Service. In the 
Fall of 1947, staff officers of the Army’s Military Personnel Procurement 
Service Division and their advertising agency, the N. W. Ayer Co., 
began to feel as a result of a continuing decline in enlistments that a 
change in advertising direction was indicated. 

Accordingly, two coordinated surveys were conducted: the first by the 
Army, through its Attitude Research Branch to survey newly enlisted 
recruits; the other by the advertising agency through a commercial 
polling organization to survey young civilian males and their parents. 

The surveys yielded new insights into the problem of appropriate 
advertising for recruiting. For example, one traditional advantage of 
military service—early retirement and good retirement pay—was found 
to have practically no appeal among 17-18 year old youngsters, but was 
of considerable importance in the re-enlistment of older. veterans. 


4 Jack Elinson 


Evaluating an Information Program 


4. Most staff sections need at one time or other to have troops in the 
field informed on certain matters. Questions looming in the minds of 
those who must get out information are: (a) how can the information 
be made to reach the largest proportion of those who should be reached? 
(b) what presentation will be most effective in getting the information 
read after it is gotten out? (c) how can the information be put across so 
that it is most likely to be remembered once it is read? and once the in- 
formation is released, (1) how widely has it been seen, read and re- 
membered? (2) did it accomplish what it was supposed to accomplish? 

Effectiveness of any single information tool or device, such as movies, 
radio programs, posters, pamphlets, training courses, and the like, can 
validly be determined only by a true experimental approach using as 
subjects both control and experimental groups. During the war, num- 
erous such studies were made by the Research Branch on Hollywood- 
produced films which were calculated to give the soldier a better under- 
standing of the issues of the war. Compared to broad cross-sectional 
sample surveys, experimental evaluation studies of this kind are usually 
less costly, but they remain inordinately extravagant in the use of re- 
search personnel time, and also involve more than the usual cooperation 
of operating officials, that is, commanding officers. Consequently, since 
the war, such studies of information media have been restricted to those 
of exceptional importance. One, currently under way, is an experimental 
evaluation of the new film produced under the auspices of the Surgeon 
General of the Army, entitled “Miracle of Living,” a film designed to 
produce certain changes in information, attitude, and behavior among 
enlisted men with respect to venereal disease. 


Evidence Pro and Con of a Proposed Policy or Change in Policy 


5, Virtually all Research Branch studies have been used or are poten- 
tially useful for the purpose of providing a source of quantitative infor- 
mation and evidence in support of or against a proposed policy or a change 
in policy. In contrast to arm-chair opinion based on umbilical medita- 
tions, quantitative evidence derived from scientific sampling surveys are 
invaluable tools in the hands of skillful administrators. Among in- 
stances which may be mentioned of this use of attitude research data are 
studies of officer-enlisted man relationships used by the Doolittle Board 
in preparing its recommendations, attitudes toward Army Courts-martial 
procedure, survey of educational and recreational interests of soldiers, 
surveys among hospital patients with respect to treatment, surveys for 
the Quartermaster General on soldiers’ food and clothing preferences, 


Aititude Research in the Army 5 


comparison of competing physical training programs, survey among 
medical officers with respect to reasons why they would or would not 
accept commissions in the Regular Army, housing demands both among 
men in the Army and those about to be discharged, attitudes of officers 
toward logistical careers and training programs, and other studies of a 
more confidential nature. In short, as General Lanham has phrased it, 
attitude research has, within small and useful margins of error, proved 
itself to be the “morale radar” of the Armed Forces. 


Received June 18, 1948. 


ec 


w 


~) 


The Psychological Barometer on Communism, 
Americanism and Socialism 


Henry C. Link and Albert D. Freiberg 
The Psychological Corporation, New York City 


The following results are taken from three Barometer surveys: the 
August, 1948 survey made with 5000 urban interviews; the October, 1948 
survey made with 1000 interviews but with a comparable sample; the 
November, 1948 survey made with 10,000 urban interviews. The dates 
and size of sample are given with each table. 

In the August Barometer of 5000 interviews, one of the questions 
asked was: 

Q. What, in your opinion, are the three most dangerous threats within our own 
country to a prosperous America? 

This question was asked specifically for the employee relations division 
of the General Electric Company. The answers showed that two threats, 
inflation and Communism, were considered by far the most dangerous, 
with strikes and industrial conflict a distant third. The per cents 
mentioning various dangers were: 


Threats to a Prosperous America 
Answers, August 1948 % 


Inflation, high prices 49.5 
Other economic threats such as a depression, 4.4%; 
high or low wages, 1.7%; O.P.A. or lack of O.P.A., 


.9%; miscellaneous, 1.8%; total 8.8 
Communism 44.1 
Fascism, .9%; Socialism, .5%; foreign spies and in- 

filtration, 1.1%; lack of freedom, .8%; total 2.8 
Strikes, struggle between capital and labor 12.1 
Power of unions, organized labor 5.0 
Taft-Hartley Act 3 
Politicians, political parties, politics 10.6 
War talk, threat of war 10.6 
Big business, monopolies, Wall Street, capitalism, 

high profits 2.9 
Race prejudice and intolerance 8.8 
Civil rights program, Jews, Negroes, immigrants 1.7. 


Atomic bomb, 1.8%; inadequate military defense, 
. .6%; draft, 4%; poor foreign policy, E.R.P., 
1.4%; the Russians, 1.6%; total 5.8 


6 


Communism, Americanism and Socialism 7 


Social and psychological threats: 
Lack of housing, 4.2%; alcohol, drinking, 3.8%; 
crime, delinquency, 3.6%; lack of religion, 3%; 
family trouble, 1.9%; poor education, 1.9%; 
movies, theatres, radios, comic books, .6%; lack 


of cooperation, 3%; greed, 1.7%; misc. 5.8%; total 29.5 
Bad govt., bureaucracy, graft, govt. racketeers, govt. 
restrictions, govt. spending, high taxes; total 7.5 
Natural disasters including fire, floods, rodents, 
drought, wastefulness of resources 3.5 
Miscellaneous 11.3 
Don’t know 12.7 
Total Interviews 5000 


Is Communism Becoming Dangerous? 


The growing danger of Communism in the United States is further 
indicated by the answers in 1946 and in 1948 to this question: 


Q. It is being said that Communism is becoming a dangerous thing in the United 
States. Do you think this is true or not? 


Answers April 1946 October 1948 
% % 
True 51.2 67.0 oe 
Not true 34.1 24.5 
Don’t know 14.7 8.5 
Total Interviews 2500 1000 


This conviction is shared pretty much by all socio-economic groups, 
and by union and non-union families alike, as shown by the following 
table: 


Union Membership 
Socio-Economic Group oM 


Non- 

Answers, Oct, 1948 ABS O Union Union 
% ART % % 
True 80 66 67 62 . 65 68 
Not true 14 26 27 2% 29 23 
Don’t know CRS 6 14 6 9 
Total Interviews 100 300 400 200 278 722 


Are Communists Traitors? 


In previous surveys, it was found that Communists in the United States 
were regarded by 77 per cent to be a fifth column, loyal to Russia first, 
rather than as a typical American political party. A majority favored 
outlawing the Communist party. The sharpest definition of this issue 
was made in the question: 


A 


ba 


i 


8 Henry C. Link and Albert D. Freiberg 


Q. Do you think a Communist is a traitor to the United States? 


Answers January 1948 October 1948 
% j % 
Yes 65 70.6 
No 18 18.0 
Don’t know 17 11.4 
Total Interviews 600 2500 


Union and non-union members thought alike on this subject, whereas, 
by socio-economic groups, the “yes” answers ranged from 81 per cent in 
the “A” group to 62 per cent in the “D” group. 


Is Socialism Becoming Dangerous? 


Whereas Communism in the last two years has been sharply recognized 
by the American people as a threat to their institutions, their reactions to 
Socialism are quite different. Where 67 per cent say that Communism is 


becoming dangerous, only 26 per cent say that Socialism is becoming 
dangerous. 


Q. It is being said that Socialism is becoming a dangerous thing in the United States. 
Do you think this is true or not? 


Union Membership 
Non- 


Socio-Economic Group 


Answers, Oct. 1948 Total ACB eC de Union Union 
% % h P % % % 
True 26.4 30 34 23 20 21 28 
Not true 50.5 54 50 54 43 53 50 
Don’t know 23.1 9116316028! 37 26 22 
Total Interviews 1000 100 300 400 200 S787. 722 


Are Communism and Socialism the Same? 


Because of these widely different reactions toward Communism and 
Socialism, this further question was asked: 


: Q. Do you think that Socialism and Communism are about the same or are they 
different? 


i! A Union Membership 
Socio-Economic Group 


Non- 
Answers, Oct. 1948 Total BB Oa D Union Union 
TONAR TEOS Io Fo % % 
Same 22.8 16 21 26 22 28 21 
Different 60.9 73 66 59 53 57 62 
Don’t know y 16.3 We 13 115 25 15 17 
Total Interviews 1000 100 300 400 200 278 722 


Communism, Americanism and Socialism 9 


Union members are more likely to regard them as the same than are 
non-union members, but the higher the educational level, the more likely 
people are to regard them as different. In answer to the question: 

Q. What difference do you think there is between them? 


Some of the principal reasons given were: Communism is totalitarian 
while Socialism isn’t; Socialism recognizes individual rights; Socialism is 
more liberal, more democratic; Communism means force, Socialism does 
not; Socialism is gradual, Communism is revolutionary; Communism is 
ed Socialism is good, ete., etc. However, 39 per cent gave no answer. 


Specific Issues on Communism and Socialism 


The sharp repudiation of Communism as compared with Socialism is 
no doubt influenced by the strained relations between Russia and the 
United States. Therefore, it is of unusual significance to ascertain 
people’s reactions to specific measures which tend to bring about Social- 
ism or Communism, or both, in this country. In the previous survey,! 
we reported on such issues as government versus private ownership of 
manufacturing companies, who does the most for the good of the workers, 
preference for jobs in private industry or the government, and investing 
money in government bonds or private concerns. 

One of the questions asked in the October survey was: 


Q. Do you think government control of business would be a step toward Com- 
munism? Toward Socialism? 


Toward Toward 
Answers, Oct. 1948 Communism Socialism 
% % 
Yes 61.3 49.7 
No 22.5 18.9 
Don’t know 16.2 31.4 
Total Interviews 1000 1000 


The answers by union membership and socio-economic group to these 
two questions were: 
Q. Do you think government control of business would be a step toward Commun- 
ism? i 
Union Membership 
Socio-Economic Group AAPA T 


Non- 

Answers, Oct. 1948 Aves Bos Gor D, Union Union 
% S% % % % % 
Yes 64 65 63 50 59 62 
No 21 23 23 23 23 23 
Don’t know. 16s 12s 214) 5:97: 18 15 
Total Interviews 100 300 400 200 278 722 


1 Link, H. C. and Freiberg, A. D. The 97th psychological barometer. Journal of 
Applied Psychology, 1948, 32, 443-451. Š 


far 


S 


10 Henry C. Link and Albert D. Freiberg 


Q. Do you think government control of business would be a step toward Socialism? 
y Union Membership 


Socio-Economic Group N 

coin Sats ee ed on- 

Answers, Oct. 1948 F. ONEN -A ONN D Union Union 
% h %T % % % 
Yes 63 58 48 35 41 53 
No 16 19 #19 20 21 18 
Don’t know 21 23 33 45 38 29 
Total Interviews 100 300 400 200 278 722 


Not inconsistent with the answers to the question on the differences 
between Communism and Socialism were the answers to the following 
question: 

Q. Do you think a country can have democracy without having private capitalism? 
Union Membership 


Socio-Economic Group 


— Non- 
Answers, Oct. 1948 Total BEA PET Se i Union Union 
% D h h % % % 
Yes 20.9 22 19 21 +24 24 20 
No 57.4 61 65 58 42 50 60 
Don’t know 21.7 17 16 21 34 26 20 
Total Interviews 1000 100 300 400 200 278 722 


More than 42 per cent are either uncertain or say that private capital- 
ism is not necessary for democracy. This is especially interesting in view 
of the recent statements by Dwight D. Eisenhower, in his installation 
address as President of Columbia University and other talks, to the effect 


that private property rights in the United States are the keystone of all 
other democratic freedoms. 


Price Control and the O.P.A. 


The readiness of the people to accept socialistic controls, or govern- 
mental controls which amount to the confiscation of property, is illus- 
trated by the answers to this question: 


$ Q. What do you think would do most to keep prices down: the O.P.A. and its price 
ceilings, or free competition by business without any O.P.A.? 


i Union Membership 
Socio-Economic Group 


Answers, November 1948 Total ANB Cc D Union Uon 

A E ea E NU S O eis a O 

O.P.A. and its price i aia A k i k K 
ceilings 41.5 33 35 

Competition by business SAREA po E 
without any O.P.A. 445 58 52 43 30 37 48 

Don’t know 14.0 SANIRAL o 18 14 14 

Total Interviews 5000 500 1500 2000 1000 1438 3562 


Communism, Americanism and Socialism 11 


The opinions of people on price control have been subject to very 
wide fluctuations. In the spring of 1946, all polls showed a large majority 
of the public favoring the O.P.A. By the fall of 1946, this attitude had 
almost completely reversed itself. The results of our polls on this sub- 
ject are: 


Oct. Aug. . Nov. 
Answers 1946 1948 1948 
% % % 

O.P.A. and its price ceilings 26.1 47.2 41.5 
Competition by business 

without any O.P.A. 65.1 39.7 44.5 

Don’t know 8.8 13.1 14.0 

Total Interviews 2500 5000 5000 


Socialistic Trends in Housing 


A further illustration of people’s readiness to accept socialistic meas- 
ures is provided by their answers to this question on housing: 


Q. How do you think the housing problem will be settled best: (a) by having the 
Federal Government furnish the money and plans, or (b) by leaving it to private in- 
dividuals and builders? 

Union Membership 
Socio-Economic Group =——__—— 


Non- 
Answers, Oct. 1948 Total ASB see uD) Union Union 
% hh h% % % % 
Having Federal Govt. 
`furnish money and plans 37.0 28 30 39 48 44 34 
Leaving it to private 
builders and individuals 51.8 64 59 51 36 44 55 
Don’t know 11.2 8 11 10 16 12 11 
Total Interviews 1000 100 300 400 200 278 722 


Other issues bearing on the conflict between Communism, Socialism, 
and traditional Americanism or a democracy based on private capitalism 
will be taken up from time to time. 


Attitude Toward the Taft-Hartley Law 


The feeling against the Taft-Hartley law among union members or 
union families is not nearly as unanimous as union leaders have pre- 
sented it to be. Of those questioned, 94 per cent answered “yes” to the 
question: Have you heard of the Taft-Hartley law which was passed by 
Congress to regulate unions, control strikes and get rid of Communist 
leaders? Then we asked: 


~~ 


12 Henry C. Link and Albert D. Freiberg 


Q. During the past year do you think this law has done more harm than good or 
more good than harm? 


Union Membership 


Socio-Economic Group = 

SE ee el on- 

Answers, Oct. 1948 Total t. A -TAA S MARID D) Union Union 
% h % i Mo % % % 
More harm than good 24.8 15 23 2 2% 34 21 
More good than harm 39.7 60 50 34 25 29 44 
Don’t know 35.5 25 27 37 5l 37 35 
Total Interviews 1000 100 300 400 200 278 722 


The Chief Victims of the Increase in the Cost of Living 
The answers to this question show one of the sharpest differences by 
socio-economic groups that we have ever recorded. 


Q. Who has suffered most from the increase in the cost of living: the workers on 


salaries and wages, or the people who must live on the income from life insurance, 
Government bonds, stocks and other savings? 


Socio-Economic Group 


Answers, Oct., 1948 Total RE OD 
% P h % % 
Workers on salaries and wages 36.3 25 27 38 52 


People who must live on income from 
life insurance, etc. 56.8 69 69 57 33 
Don’t know 


6.9 6 4 5 15 
Total Interviews 1000 100 300 400 200 


Family Prosperity 


Q. Is your family more prosperous (or better off) today than two years ago, less 
prosperous, or the same? 


In spite of high prices, most families continue to think of themselves 
as better off or as well off as they were two years ago. 


A Union Membership 
Socio-Economic Group ———— 


Answers, November, 1948 Total A B C D Union Union 
More prosperous bs a oe G: a i en 
The same 458 49 46 46 44 43 47 
Less prosperous 260 2 2 27 27 29 25 
Uncertain ; 40 3 4 4 4 3 4 

Total Interviews 5000 500 1500 2000 1000 1438 3562 


„The above figures show a rather significant difference between the s 
opinions of union members and non-union members. Although the 


Communism, Americanism and Socialism ; 13 


unions are organized to obtain quick and broad wage increases, union 
members do not consider themselves as well off in the scale of living as do 
non-union members who have had to rely on themselves. Contrary to 
the popular belief that the white collar workers are the principal losers 
from the cost of living rise, this group, principally the “B” group, con- 
siders itself better off than does the large group of skilled and 
semi-skilled wage workers where unionism is strongest (groups “C” and 
“D’’). This may be due in part to the steadiness of their work as com- 
pared with the time lost by wage earners through strikes, material 
shortages and the indirect results of strikes in related industries. 

We have now been asking this question for several years and some of 
the results are as follows: 


Oct. Oct. Oct. Apr. Oct. Apr. Oct. Nov. 
Answers 1941 1943 1945 1946 1946 1947 1947 1948 


% % % % % % % % 


More prosperous 38 29 32 26 ' 31 29 2⁄4 24 
The same 47 46 51 48 44 42 46 46 
Less prosperous 15 23 15 24 22 26 28 26 
Don’t know 2 2 2 3 3 2 4e 


Total Interviews 2000 2500 2500 2500 2500 2500 2500 5000 


Probability of Another War 


The prospects of avoiding war, in people’s opinion, have improved 
during the past year, as shown by the October, 1948 survey. The ques- 
tion was: 


Q. Do you think we can make a lasting peace or do you think that there will be 


another war within the next 20 years or so? 
Feb. Oct. Oct. Oct. Oct. Oct. 


Answers . 1943 1944 1945 1946 1947 1948 

% % % % % % 
Lasting peace 47 28 28 18 11 20 
Another war within 20 years 43 54 59 74 77 69 
Don’t know 10 18 13 8 12 11 
Total Interviews 2500 2500 2500 2500 2500 1000 


Another question on this same subject was: 


Q. How about the next three or four years: another war or no war? 


Answers, October 1948 % 

War 35.3 
No war 42.8 
Don’t know 21.9 


Total Interviews 1000 


i Henry C. Link and Albert D. Freiberg ` 


The Civil Rights Issue 
In view of the great controversy over the civil rights program, the 
following question was asked with interesting results: 


Q Whieh would do more good for American Negroos: (a) passing laws to give them 
equal rights with whites; (b) a program to teach white and Negro to get along together? 
Socio-Economic Geographic Area 


Group - 
Mid- Far 
Anmwe, October, 108 Toal A B C D East West South Wost 
RHEE SH A % 
Paming isme for oqusa) ighis 117 11 12 9 18 12 13 n 9 

A program to teach whites 
ated Negron to get along 746 7% 0 8 OO UT a 7 7 
eee ® hn I $ 2 2 
Dert know oo n soon w0 9 5 12 
Total Interviews 1000 100 300 400 20 370 Mö 25 110 

* Lem ihan 3% 
Explanation of the Surveys 


x Each of these surveys was made with a true cross-section of the urban 
population. The August and November surveys were made in 100 cities 
and towns; the October survey was mado in 47 cities and towns. 


2 È pjgz3a3 ił aa P 
. psi I j 33333 iiih 
H HE : ni F Jliz WHI 
: Hi sili j i siasi 
AA HEH Ha a HGH 
a] andj a Jili aat 
ec A ree iE 
z 2 H Sas i AE 
: ae | Hi vse TE 
3 Biel | iil bai H 
Hokai | ah ae 


a eee ll T ee 


~ 


16 Daniel Katz 


1944 and 45.1 per cent in 1948. The reason why the 1948 election was 
close was not that there had been a gain in the Republican vote, but that 
there were defections from the Democratic vote to Governor Thurmond 
and Henry Wallace. Though the national percentage total for Dewey 
remained constant, there were interesting shifts in the sectional support 
he received in the two elections. The Republican candidate made slight 
gains in the industrial east and on the Pacific Coast, but suffered real 
losses in the west-central states, namely in Iowa, Kansas, Minnesota, 
Missouri, Nebraska, South Dakota and Wisconsin. Neither the polls 
nor the newspapers detected this very significant reversal of national 
voting behavior in which Truman carried a number of the farm states. 


State-by-State Errors of the Polls 


In their predictions of the specific states the polls almost doubled 
their average state error of 1940 and 1944. They were not as far off as 
in 1936, save that their error this time was one of sign as well as magni- 
tude, i.e. they missed the winning candidate. Crossley’s average state 
error of 4.4 was almost a percentage point better than Gallup’s. The 
Crossley poll missed 11 of the 48 states by six percentage points or more 
as against 16 similar misses by the American Institute of Public Opinion. 
It is significant that most of Gallup’s large errors were in states where the 
Republicans lost votes from 1944 to 1948. Where the Republicans made 
gains Gallup’s errors tended to be smaller. In other words, the Gallup 
prediction was that of a general increase, fairly evenly distributed over 
the nation, rather than a differential increase in certain states. This 
means that no simple correction for Gallup’s inflation of the Republican 
vote on a state-to-state basis would have remedied his inaccuracies. 
Table 2 presents the state-by-state errors of the Gallup and Crossley 
polls. Roper made no state estimates. 


General Reasons for the Failure of the Polls 


To the world of applied research the poor predictive performance of 
the polls was as much of an upset as the election of President Truman 
was to the newspaper world. Yet from a scientific point of view there 
was evidence, before November 1948, that the polls could not continue 
their successful record without a change in basic methodological approach 
as well as in specific techniques. The general philosophy of the pollsters 
was one of rule-of-thumb procedure rather than sound theory and method. 
What had worked in the past was accepted at face value without an 
analysis of why it had worked nor an analysis of the conditions under 
which it had worked. Moreover, their specific techniques of sampling, of 
interviewing, of research design were known to have serious weaknesses. 

The pollsters began in 1936 with an improvement upon the Literary 


Table 2 
State-by-State Errors of Gallup and Crossley 


% of Major Error in Percentage Points 

Party Vote —— 

for Truman * Gallup Crossley 
Alabama ** 
Arizona 53.8 — 08 + 12 
Arkansas 61.7 = 8.7 + 13 
California 47.6 — 46 — 3.6 
Colorado 51.9 — 29 — 29 
Connecticut 48.4 — 44 — 84 
Delaware 48.8 = 18 — 08 
Florida 48.8 — 38 — 08 
Georgia 60.8 — 28 + 1.2 
Idaho 50.0 — 3.0 — 5.0 
Illinois 50.1 — 41 = 71 
Indiana 48.4 — 44 — 44 
Iowa 50.3 -73 —10.3 
Kansas 44.6 — 5.6 — 2.6 
Kentucky 56.7 — U7 — 37 
Louisiana 32.8 + 6.2 — 48 
Maine 42.3 — 03 -= 33 
Maryland 48.0 — 40 — 2.0 
Massachusetts 54.7 = 97 -7.7 
Michigan 47.6 — 3.6 — 0.6 
Minnesota 57.2 —11.2 — 9.2 
Mississippi 9.8 + 5.2 + 8.2 
Missouri 58.1 — 61 — 3.1 
Montana 53.1 — 3.1 -41 
Nebraska 45.8 -78 -= 38 
Nevada 50.4 — 34 — 24 
New Hampshire 46.7 — 2.7 - 57 
New Jersey 45.9 — 3.9 — 49 
New Mexico 56.4 — 54 — 44 
New York 45.0 — 6.0 — 30 
North Carolina 58.0 — 7.0 - 10 
North Dakota 43.4 — 5.4 — 44 
Ohio 49.5 — 75 — 4.5 
Oklahoma 62.7 -7.7 — 47 
Oregon 46.4 — 44 — 44 
Pennsylvania 46.9 -29 — 49 
Rhode Island 57.8 — 38 -= 48 
South Carolina 24.1 +13.9 + 4.9 
South. Dakota 47.0 — 6.0 —10.0 
Tennessee 49.1 + 19 =- 11 
Texas 65.4 + 0.6 + 0.6 
Utah 54.0 — 40 — 6.0 
Vermont 36.9 - 19 -— 59 
Virginia 47.9 — 39 — 19 
Washington 52.6 — 5.6 — 6.6 
West Virginia 57.3 —11.3 — 7.38 
Wisconsin 50.7 — 97 = 7.7 
Wyoming 51.6 — 46 — 56 
Average State Error 5.2 44 

2 * Final figures compiled by the Associated Press and reported by the New York 


Times December 11, 1948. 


** Truman not on the ballot. i 


18 Daniel Katz 


Digest biased method of sampling. Since 1936 they made some minor 
improvements and learned either to take advantage of their compen- 
sating errors or to correct for their biases, but they never made major 
advances in methodology. Why, then, did they do so well in 1940 and 
1944 with their methods and techniques and so poorly in 1948? The 
two main reasons seem to be: (1) Their experience with techniques and 
corrections in Roosevelt elections. With a change in the political scene 
their procedures no longer functioned effectively. Thus Gallup and 
Crossley started in 1936 with an error of 6.9 percentage points, improved 
their performance in subsequent Roosevelt elections, but moved back 
toward their original starting point when they attempted a presidential 
election in which different factors were operative; (2) The Roosevelt 
elections were highly structured situations in which the dominant per- 
sonality of Roosevelt crystallized attitudes and opinions. With this 
definite bipolarity of attitude it was not difficult, even with poor tech- 
niques, to make election predictions. 

Moreover, the polls have never adequately examined the nature of 
the problem of prediction. In basic science, predictions are made not for 
an open system of events but in terms of contingent conditions. In 
applied science, the engineer or the weather forecaster makes some 
estimates of the possible determinants of the process or event he is 
attempting to predict. Similarly in attempting to make predictions 
about social behavior, the social scientist must take into account the 
relevant field of forces. He cannot merely single out a behavioral or 
attitudinal trend and predict its repetition. Yet this is essentially what 
the pollsters attempt to do. They reproduce the national election in 
miniature and assume that the final election will be a repetition of the 
trend they have measured without recourse to the many determinants 
of voting behavior. 

Tt should be emphasized at the start that their fundamental mistake is 
not to be found so much in any one technique, such as quota sampling or 
fixed-alternative questions, as in poor research design. In basic science 
and in applied science we attempt to measure the relationship between 
two variables and seek to establish causal connections. We do this, 
moreover, at some level of generality beyond the specific content of one 
particular situation so that we can build up generalizations which apply 
to the same type of social process. This means that we do more than 
report the given percentage of people who favor the Marshall Plan or say 
that they will vote for President Truman. This means, moreover, that 
we must conceptualize and identify the important variables and obtain 
systematic measures of them. 

Bie ey this logic of research design to election prediction, we 
© set up a number of studies designed to measure the determinants 


’ 


Analysis of 1948 Polling Predictions 19 


of voting behavior or turn-out and the causal conditions affecting political 
conviction. It is not enough to have some rough measure of background 
variables such as income level, or amount of schooling, or even union 
membership. We need some picture, in addition, of the intervening 
variables which will give us the perceptions and attitudes related to 
political parties and political party candidates. How much of this can 
be done by public opinion polls is a debatable question, but it is scarcely 
in their best interests to continue to lag behind the advances made by 
psychologists and social scientists in their studies of human behavior. 
These points have all been made before the 1948 polling debacle and can 
be found in the writings of A. Campbell, D. Cartwright, R. Crutchfield, 
D. Krech and the present writer.! 


Sources of Error in the 1948 Polls 


It will never be possible to make a precise assessment of the contribu- 
tion of every factor to the error of the polls. Since the polls did not set 
up adequate hypotheses about voting behavior and political preferences 
during the campaign, the data are not now available for analysis. It is 
not even possible to go back and reinterview the same respondents 
sampled by the polls because the polls did not take names or addresses. 
There are some limited panel studies where this is being done and they 
will throw some light upon the problem. Gallup did ask a sample of 
respondents to return postcards after the election to indicate how they 
voted, but the selective bias in a mail-return makes these data hazardous 
to interpret. 

It is usually assumed that the important sources of error, however, 
are to be found in: (1) differential turn-out; (2) the undecided voter; (3) 
the changing voter; and (4) the representativeness of the sample. 


Differential Turn-out 


Australia is the pollsters’ Utopia, for in Australia the law requires 
all citizens to vote. It must be remembered that in our country fore- 
casting an election involves two predictions: an estimate of how voters 
feel about the candidates and an estimate of which voters will go to 
the polls on election day. In general the polling organizations make 
no systematic correction for turn-out but depend upon their educational 
bias in sampling for the major adjustment. 

1A. Campbell. Polling, open interviewing, and the problem of interpretation. J. 
Soc. Issues, 1946, 2, 67-71; D. Cartwright. Review of G. Gallup’s A guide to public 
opinion polls. J. consult. Psychol., 1945, 9, 201-202; R. Crutchfield and D. Krech, 
Theory and problems of social psychology. New York: McGraw-Hill, 1945; D. Kata. 
Survey technique and polling procedure as methods in social science. J. Soc. Issues, 
1946, 2, 33-44; and D. Katz. The interpretation of survey findings. J. Soc. Issues. 
1946, 2, 62-66. 


t 
| Bureav Edni.° 


t man UA 


20 Daniel Katz 


One explanation of both Dewey’s defeat and the pollsters’ failure is 
that the Republicans stayed away from the polls in greater numbers than 
they usually do, as compared to the customary voting behavior of the 
Democrats. The reasons marshalled to support this theory are varied 
and not too consistent. For example, the opinion polls defeated them- 
selves by making the Republicans overconfident and so less energetic 
about getting out the vote; or the Republicans were apathetic about 
their standard bearer; or the farmers were too busy getting in the harvest 
on election day to go to the polls. 

The hypothesis of Republican overconfidence, or indifference, in its 
effect upon turn-out, makes sense only if we assume that the polls were 
accurate in their original estimates about the wishes of the people. It 
can be argued more plausibly that the nature of the turn-out in 1948 
reduced rather than increased the prediction error. Neither party did a 
good job on turn-out in 1948. Many Democrats as well as Republicans 
stayed away from the polls. Against the overconfidence of the Repub- 
licans was the lack of motivation on the part of millions of Democrats 
who idolized Roosevelt and found Truman a weak substitute. Since 
there are considerably more people in the country who consider them- 
selves Democrats than consider themselves Republicans and since young 
people who come of voting age are more likely to favor the Democratic 
than the Republican ticket, the chances are that if the national turn-out 
had been as heavy in 1948 as in 1940, there would have been a Truman 
landslide and not a Dewey victory. The overconfidence hypothesis 
ignores the fact that party machines are organized on a local and state 
basis. Even though the Republicans thought the presidential election 
was in the bag, there were many Congressional, state and local offices in 
doubt, for which it was necessary to turn out the vote. And the states in 
which overconfidence should have been the highest according to this 
theory were the states where Dewey actually made gains as in Maine 
and Vermont. 

There is no proof that the upper-income Republican groups relative 
to lower-income groups failed to vote in greater numbers in 1948 than in 
the past. The figures in Table 3 show turn-out by economic groups for 
1948 and the heavy-voting year of 1940. 

The NORC survey in 1940 showed that the lowest income group 


stayed away from the polls in a ratio of three to one compared to the 


highest income group. The Roper figures show an even higher ratio 


in 1948 in favor of greater turn-out among the -i It 
should be stated, however, i ee te 


that the comparability of these fi l 

¢ gures leaves 

= aie desired. They were obtained by two different organizations 

a the income groupings may vary considerably. They are suggestive, 
owever, in their implication that the 1948 turnout actually favored 


Analysis of 1948 Polling Predictions 21 


‘the Republicans. The same inference was made before the election when © 
experts asserted that a turn-out of under 49,000,000 would help the 
Republicans. 


Table 3 
Turn-out by Economie Groups in 1948 and in 1940 
Did Not Vote 
Economic 1948 Post-election 1940 Post-election 
Group Poll by Roper Survey by NORC 

A 11.3% 16.0% 

B 14.6 

© 26.7 uty 

D 40.6 47.0 


Similar interpretations come from a study of the farm and city vote. 
If the Truman victory were a matter of differential turn-out, then we 
would expect bigger Democratic majorities than usual in the industrial 
centers where the unions and Democratic machines are entrenched. 
But this was not the case. Truman lost a number of industrial eastern ¢ 
states and ran surprisingly well in the farm belt and in rural districts. 
Preliminary analysis of rural and urban counties corroborates the na- 
tional trend. Dewey lost not because the Republican farmers stayed 
away from the polls but because many of them voted for Truman. 

Though turn-out does not seem to be the explanation for the diff- 
culties of the pollsters, it is essential in future research that attempts be 
made to measure, or take into account more thoroughly, the factors 
which affect turn-out. Certainly much more can be done to get at the 
spontaneous forces within voters which get them to the polls on election 
day. Crossley has made a start on this problem with questions on 
voting intention and certainty of voting but in addition we need to 
study the potency of the individual’s involvement in both the national 
and local elections, the importance the individual attaches to his own vote, 
and his feeling of responsibility toward voting participation in the 
democratic process. The external factors are more difficult to get -at 
but unless we know something about the relative strength of political 
machines in various states and the pressures of the individual’s own social 
group, we are handicapped in making predictions. 


The Undecided Voter 


A larger proportion of people than usual could not, or would not, 
tell interviewers how they were going to vote on election day. The 
Roper survey in August, 1948 found 15.4 per c E thre 
and Gallup and Crossley still had about 8 p fE S undeci ed in October. 
an KR 2 g) < 
ene: 


22 Daniel Katz 


» The polling predictions were computed with the undecided group omitted 
on the assumption that these people, to the extent they voted, would 
distribute themselves among the presidential candidates in the same 
proportions as the decided voters. 

The mistake in this assumption was that the great majority of the 
undecided were not at a mid-point between the two major candidates. 
Many people were undecided between Truman and the minor party 
candidates or between Truman and not voting at all. There is direct 
and indirect evidence that the undecided vote went more heavily to 
Truman than to Dewey. 

The Survey Research Center of the University of Michigan asked 
people about their voting intentions in an October study which was being 
conducted for another purpose than election prediction. The question 
was asked to get a measure of political identification for correlational 
analysis. Since names and addresses were available the same panel was 
re-interviewed after the election and queried about their actual voting 
behavior. The people who originally said they did not know how they 
would vote, now reported that they voted for Truman in a ratio of two 
to one. Fewer of this undecided group reported that they voted than 
of the decided group. Though the national sample was small, the results 
are consistent with other findings. Similar evidence will be available 
from other panel studies. 

The indirect evidence comes from an examination of the undecided 
group in pre-election surveys. Roper’s results show that many more of 
the people, who did not know how they would vote, considered themselves 
Democrats than considered themselves Republicans. Roper also asked 
about such issues as rent control, social security measures, and the Taft- 
Hartley act. The undecided group were not consistent in their re- 
sponses, but on the whole, they resembled Truman supporters more than 
Dewey supporters. 

The undecided voter was thus one source of the polling error in 
prediction. But because the undecided group was after all a minority 
and because they did not turn out to vote as much as the decided group, 
they could not have contributed more than about one per cent of the 
five per cent prediction-error. 

The failure of the polls to study the undecided vote illustrates the 
lack of research design in their methods. It would have been possible 
to have set up systematic hypotheses about this group and explored the 
nature of their indecision, the reasons for their indecision, their basic 
political philosophy, etc. The Roper poll had some data on the un- 
decided group but it made no real use of the information it had. In 
the past the undecided vote was not a problem in election prediction 
and even in 1948 it may not have been a major factor. Nonetheless it 


+ 


Analysis of 1948 Polling Predictions 23 


may loom larger in future elections. But more important is the con-» 
sideration that it is related in its psychological dimensions to the problem 
of the changing voter. 

The Changing Voter 


Another source of polling error was the fact that some people told the 
interviewer one thing and then behaved differently on election day. 
This distortion is twofold in nature. Part of the problem is a matter of 
interviewing skill and technique, in that people may give what seems 
like a socially acceptable answer in the interviewing situation. If Dewey 
is supposed to be the popular candidate, if his is the name they ordinarily 
hear in everyday conversation as the assured winner, and if they have 
some doubt about Truman, people may find it easier to say “Dewey” 
when asked the direct question about voting preference. There is no 
documentation for this possible source of error in the last election but 
it suggests the importance of thorough interviewing and real training 
of interviewers. 

The second part of the problem concerns genuine psychological change. 
In a difficult choice-situation some people may give one response to an 
interviewer but when confronted with the reality of the election booth 
they may change their minds. Take, for example, the supporter of the 
New Deal, dissatisfied with Truman, who says before election he will 
vote for any candidate save Truman. When the chips are down, how- 
ever, he returns to the party most representative of his beliefs. Another 
type of change is typified by the farmer who originally planned to vote 
for Dewey, became alarmed at the fall in farm prices and the Republican 
position on the support of farm prices, and voted in terms of what seemed 
to him his best self-interest. 

Panel studies give some support to the changing voter as a source 
of polling error. More people who said they would vote Republican, 
after the election report they voted Democratic than report the reverse. 
It is not possible to estimate precisely how much this was responsible 
for the prediction failure. In a post-election survey (see Table 4), 
Gallup found that in general Dewey voters had made up their minds 
earlier in the campaign than Truman voters. 

These findings indicate that the 1948 political situation had a different 
psychological structure for many people than the Roosevelt elections. 
In the final analysis most people may have voted for the party which 
represented their welfare as they saw it. But they did not crystallize 
their beliefs until they had to. They finally reached a decision con- 
sonant with their basic attitudes. This may be why so many people 
who talked against Truman were so delighted with the election returns. 


2 This hypothesis about the changing voter was suggested by R. Crutchfield. 


* 


24 Daniel Katz 


Table 4 
Gallup Post-election Survey of Time Voters Made Up Their Minds 
Definitely Made Up 

Their Minds Truman Dewey All voters 
Before campaign started 46% 64% 54% 
Early in campaign 11 12 12 
First half, Oct. 4 2 3 
Second half, Oct. 13 5 9 
Election day 5 3 4 
Indefinite 21 14 18 


100% 100% 100% 


The lesson for election prediction, presented by the undecided and 
changing voter, is not primarily the necessity of polling until the last 
moment. Trend studies must be made, but adequate research in this 
field should be more than the projection of a single attitudinal trend. 
The polls can interview 48 hours before the election and still miss the 
voter who reacts differently to the reality of the election booth than to 

, the straw ballot. The real lesson is that the determinants of political 
behavior must be systematically explored. We need to study how the 
voter perceives political parties and candidates; for example, to what 
extent is he politically-minded in viewing a candidate and a party as an 
instrument for protecting and improving his interests, to what extent is 
he reacting to the personalities of the candidates, etc: We need, further- 
more, to investigate the basic social, economic, and political beliefs and 
their relative importance to him. , 

To do thorough studies of this kind requires much more theoretical 
planning than the polls have thus far done. They occasionally ask 
questions on issues, but they have not systematically designed studies to 
give answers to problems of political motivation. These studies, whether 
conducted by the pollsters or by psychologists, are indispensable to the 
making of predictions. In addition to better research planning, the use 
of intensive interviewing, even on a pre-test basis, could get the signifi- 
cant frames of reference in which people are thinking. The usual polling 
pre-test is one of testing question-wording, not one of the experimental 
investigation of the dimensions of the problem under study. Lazarsfeld 
has pointed out how adequate pre-testing with intensive interviewing 
could be used to develop more valid ballots with pre-coded answers.* 


The Representativeness of the Sample 
To estimate turn-out, to allocate the undecided vote, to gauge the 
stability of voting preference all require good interviewing and research 


3P. Lazarsfeld. The controversy over detailed interviews. Publ. Opin. Quart., 
1944, 8, 38-60. 


Analysis of 1948 Polling Predictions 25 


design which goes beyond the direct question of voting intention 
into the related causal factors. In addition, however, there is the prob- 
lem of sampling, of obtaining a truly representative cross-section of the 
electorate. The quota-control method of the polls has been under fire for 
some time and since it is a more palpable weakness than lack of study 
design, the controversy over polling methods will focus unduly about it. 

The quota-control method sets up a cross-section which in theory 
represents the larger population proportionately in terms of sex, age, 
socio-economic status, urbanization, and geographical area. Inter- 
viewers are assigned quotas on this basis and told to bring back results 
from respondents of given characteristics. It is sometimes contended 
that the quota-control method is vulnerable because it does not stratify 
on some variable related to voting behavior such as union membership. 
This argument fails to get at the essential weakness of quota-control 
sampling. If the cross-section obtained by the quota method really 
achieved a random representation of the population according to the 
controls it employs, the chances are all in favor of other characteristics 
such as religion, occupation and even union membership being properly 
represented. 

The real defect of the quota-control method is in its execution. Since 
there are no strict controls over interviewers, they in fact select the 
sample. The result is not a random, or true probability sample. Inter- 
viewers are told to bring back results from so many respondents in the 
D, or below-average economic category. What constitutes a D re- 
spondent and how D respondents are to be selected is too much a matter 
of interviewer judgment. In practice interviewers filling their quotas 
take people who are physically and psychologically more accessible. As 
middle-class members themselves they under-represent the poorer people 
and to some extent the very wealthy. Since the wealthy are much less 
numerous, these are not compensating errors. Moreover, interviewers 
tend to get respondents more like themselves on other counts than would 
be found in a truly representative sample. 

The under-representation of the lower income groups is in evidence 
whenever a quota sample is broken against some measure indicative of 
socio-economic status such as education or telephone ownership. Un- 
corrected quota samples employing no special devices to limit inter- 
viewers traditionally find between 12 to 20 per cent too few people in 
the lower education brackets. 

In 1940 and 1944 Gallup and Crossley corrected indirectly for the 
quota bias by adjusting for past voting behavior. In 1948 Gallup also 
used the respondents’ answers to questions on education to correct his 
final sample. In spite of corrections Gallup and Crossley never suc- 
ceeded in the Roosevelt elections in eliminating their Republican over- 
estimates. 


26 Daniel Katz 


Roper does not employ corrections but stands by the raw data from 
his sample. In the Roosevelt elections he had the advantage of sizable 
compensating errors in that his southern sample was much too Demo- 
cratic and his northern sample much too Republican. In 1940, for ex- 
ample, Roper overestimated the Democratic vote in the East South 
Central states by 12.5 per cent and in the South Atlantic states by 8.3 
per cent. To balance this, however, he underestimated the Democratic 
strength in the West North Central states by 6.8 per cent and in the 
Mountain states by 10.5 per cent. In 1948 the candidacy of Governor 
Thurmond knocked this compensating error into a cocked hat. In some 
southern sections Roper interviewers found Truman and Thurmond 
tied. Without his usual southern overweighting of the Democratic vote, 
there was nothing to compensate for the northern Republican inflation 
and Roper after three highly accurate predictions was not even close in 
1948, If his figures are corrected for the educational bias in his quota 
sample, however, the Roper figures are very much like the Gallup and 
Crossley predictions. 

It is clear, then, that the largest part of the Roper error was due to 
the uncorrected quota-control method of sampling. It is not clear, 
however, how much of the remaining error in prediction (about five per 
cent) is due to poor sampling, which cannot be corrected for, and how 
much to non-sampling factors. Those who defend quota sampling admit 
that area, or probability, sampling would have given slightly greater 
accuracy but they dismiss sampling as a minor factor. Their logic on 
this score is interesting in that their final argument is that quota sampling 
costs less than area sampling. 

That poor sampling did contribute to the polling error in spite of the 
corrective adjustments made in the data seems a sound interpretation 
for these reasons: 


1. Even in the Roosevelt elections, with a highly structured situa- 
tion, with attitudes and opinions well crystallized before election day, 
Gallup and Crossley were not able to correct away their inflation of the 
Republican totals. In 1940 Gallup missed 16 states by three percentage 
points or more, but only one of these errors was in the direction of over- 
estimating the Democratic vote. In 1944 he was off the mark by three 
percentage points or more in 22 states. Only two of these errors were 
overpredictions of the Democratic vote. Similarly, Crossley had 13 state 
errors, in 1944, of three percentage points or greater, but only one of these 
favored the Democratic candidate. 

2. Corrective adjustments introduced into data to compensate for 
poor sampling are always limited by the poor sampling that was done in 
the first place. To inflate an under-represented group by some corrective 


a 


Analysis of 1948 Polling Predictions 27 


weight does improve the whole sample to some extent relative to the 
neglected group, but it cannot insure the representativeness of this group. 
If, for example, we weight up the people with no better than grade school 
education by fifteen per cent, we still have not improved the character 
of the sample for this group even though we have improved its place in 
the sample. 

3. The area method of sampling was more accurate in the 1948 
elections than the quota sample but it was used in too limited a way to 
draw definite conclusions. The study of the University of Michigan 
Survey Research Center, previously referred to, used a national area 
sample and found an even division among the decided voters between 
Truman and Dewey. This evidence is limited in that the sample was 
small and in that the Center’s methods of interviewing also differ from 
polling methods. The Elmira study of Lazarsfeld, using an area sample, 
missed the vote in that town, however, by six per cent. Gallup’s quota 
sample for New York state was also six per cent in error. The clearest 
evidence is from the University of Washington Survey group which tried 
both area and quota sampling for the state of Washington. The area 
sample had an error of 2 percentage points; the quota sample an error 
of 7 percentage points. 


The interpretation of this evidence is obscured by the fact that neither 
method of sampling was followed according to its literal requirements. 
In the case of the quota method, the interviewers who used it were new 
to this method of sampling. They reported that they were unable to 
fill the lower income quota in 20 per cent of the cases. Whether this is 
the usual difficulty with the quota method which happened to be re- 
ported here because of the newness of the interviewers or whether this 
is unusually inadequate quota sampling is a matter of debate. The 
area sample also was not carried out perfectly and utilized liberal sub- 
stitutions. Nevertheless, the final figures show a superiority of the area 
sample over the quota method of five percentage points. 

Though it is unlikely that all of the prediction error was dué to quota 
sampling, it may have contributed between one and three percentage 
points to the Gallup and Crossley underestimations of the Truman vote. 
If this estimate is correct, then area-sampling would have indicated a 
closer election and counselled caution on all-out predictions. 


Applied Psychology and the Polls 


The growing criticism of the polls in the field of applied psychology 
has already become more sharply focussed with the 1948 prediction 
failure, Criticism has been directed at two main phases of polling opera- 
tions: (1) the failure of the polls to keep abreast of technical and methodo- 


28 Daniel Katz 


logical advances in pure and applied social psychology or to do methodo- 
logical research of their own; and (2) the reluctance of the polling agencies 
to make public their data and their procedures such as sample size, the 
exact corrective adjustments employed, etc. Both of these points were 
made by the technical committee of social scientists, serving the Con- 
gressional Committee which investigated Gallup in 1944. 

This criticism is undoubtedly justified but it should not lead to a 
blanket condemnation of the public opinion polls. They have made real 
contributions in the past in stimulating a quantitative and factual 
approach to problems once dealt with by journalistic reporting or arm- 
chair political science. They can make greater contributions in the 
future if they take stock of their methods. The 1948 setback should 
be of real value to them in that they may see that they have been ham- 
pered by a blind empiricism in the past. This empiricism led them to 
feel that what had apparently worked once or twice or even three times 
was somehow sacred and could be relied upon to work in the future no 
matter how conditions changed. 

Nor should the failure of the public opinion polls be construed as an 
indictment of all research in the field of consumer needs and wants. 
There are many studies in this field to which the weaknesses of polling 
techniques do not apply. As in any new research, standards and methods 
in measuring consumer reaction vary considerably. It is of interest 
that some market research organizations, interested in sound methodo- 
logical development, had accepted true probability sampling before 
November 1948. 

Though social psychologists in general may want to work for im- 
provements in polling methods, it is important to distinguish between 
the polls and basic research in social psychology and the social sciences. 
Field work employing quantitative Measurement on social psychological 
problems should not be confused with polling any more than the labora- 
tory work of the psychologist on problems of perception should be con- 

_ fused with market research. Though there are areas of overlap: the 
tendency of the layman to confuse basic and applied research should not 
mislead the professional worker. This does not mean that the social 
scientist should be completely divorced from applied research. It is 

his responsibility to help formulate standards of research that will help 
both types of research. Such standards are needed in the public interest 
and by the polls themselves. They cannot afford a repetition of their 

1948 experience. 


Received December 21, 1948. 


Note on the Cardall Practical Judgment Test 


Dorothy H. Carrington 


Institute for Psychological Services, Illinois Institute of 
Technology, Chicago, Illinois 


The Cardall Practical Judgment Test (2) was given to over 300 un- 
selected men who had come for vocational guidance to the Illinois Institute 
of Technology. Their age range was from 16 to 63 years and the educa- 
tional level from 8th grade to persons holding the Ph.D. degree. All 
subjects also took: the Adams-Lepley Personal Audit Test (1); the ACE 
Psychological Examination (1942, college edition) (5); and the Otis 
Gamma, (4). Scores on each of these tests were correlated with the 
Practical Judgment scores. The scores of the Practical Judgment Test 
were also correlated with age and education. 

The results are given in Table 1. 


Table 1. 
Correlations between Practical Judgment Test and Other Variables 


Standard Significance 
No. of Error of at: 


Variables Cases r r 5% 1% 
Age 361 02 0528 NS NS 
Education 349 21 .051 BOOR 
ACE Total 310 .29 046 8 8 
ACE Quantitative 311 24 .053 Bats: 
ACE Linguistic 307 29 052 SR 
Otis Gamma 344 37 .046 Sunes 
Personal Audit 
Seriousness-Impulsiveness Scale 315 —.05 0563 NS NS 
Firmness-Indecision Scale 316 -20 054 8 § 
Tranquillity-Irritability Scale 315 —.03 0564 NS NS 
Frankness-Evasiveness Scale 316 10 .0559 NS NS 
Stability-Instability Scale 313 15 .0567 s § 
Tolerance-Intolerance Scale 314 .02 0565 NS NS . 
Steadiness-Emotionality Scale 307 7 055 So peo: 
Persistence-Fluctuation Scale 309 .09 .0567 - NS NS 
Contentment-Worry Scale 309 07 .0568 NS NS 
Split Half r corrected by Spearman 
Brown Formula 275 69 013 


30 Dorothy H. Carrington 


The split half reliability of the scores on the Practical Judgment Test 
in the sample used is .69 corrected by the Spearman Brown Formula, 
There is a low but statistically significant positive correlation between 
the Cardall Test and intelligence as measured. The correlation with 
formal education is also low but significant. For the most part, the 
Cardall Test does not correlate with the sub-parts of the personality 
test used although on Firmness-Indecision, Stability-Instability, and 
Steadiness-Emotionality there is a slight positive correlation which is 
significant. For the sample studied, the Cardall Test scores are in- 
dependent of age. 

Cardall correlated his test with the Army Alpha, Link’s Personality 
Quotient, Bell’s Adjustment Test and college grades and found no 
significant correlations. His correlation of the Practical Judgment and 
intelligence was —.05. This differs by 42 points from the results ob- 
tained in the present study. He does not give the number of persons in 
his sample nor the type of population on which they are based so it is 
difficult to tell what factors are causing the discrepancy. 

The data in this study differ from the data in Cardall’s Manual (2) 
in that in the Illinois Institute of Technology group, practical judgment 
as measured is not totally independent of intelligence and academic back- 
ground, and there is an indication that some personality factors influence 
test scores. According to the sample studied the reliability of the test 
is too low for the test to be used for individual predictions. 

Received July 15, 1948. 


References 
1. Adams, C. R. Manual of directions for the Personal Audit Test. Chicago: Science 
Research Associates, 1945. 


2. Cardall, A. J. Manual of directions for the Practical Judgment Test. Chicago: 
Science Research Associates, 1942. 

3. Guilford, J. P. Fundamental statistics in psychology and education. New York: 
_ McGraw-Hill Company, 1942. 

4. Otis, A. S. Manual of directions for Otis Quick-Scoring Mental Ability Tests. 

Yonkers-on-Hudson: World Book Company, 1937. 

5. Thurstone, L, L., and Thurstone, Thelma G. Psychological examinations for college 

freshmen. Washington, D. C.: The American Council on Education, 1942. 


Originality Ratings of Department Store 
Display Department Personnel 


Catherine P. Dougan, Ethel Schiff and Livingston Welch 
Institute for Research in Clinical and Child Psychology, Hunter College 


In this study we attempt to measure creative thinking of the em- 
ployees in the display department of R. H. Macy’s, by means of the 
Welch Reorganization Test (1, 2) which obliges the subject to recombine 
familiar ideas according to four different patterns. It is Welch’s (1) 
assumption that the ability to recombine easily and reorganize ideas 
according to a specific plan is essential to all types of creative thinking. 
His contention is not that this is the only factor involved, but the indi- 
vidual lacking this ability will be seriously handicapped in an imaginative 
capacity. . 

In two previous studies the Reorganization Test was given to 30 
professional artists, 25 art majors and 48 unselected students. We will 
compare the results of these investigations with those obtained in the 
present study. 


Procedure 


1. The Reorganization Test. The test is divided into four parts. The first 
three sub-tests make use of written material and the fourth makes use of blocks. 
The total testing time is 26 minutes. 


Part 1. Instructions 


Recombine the words of each group on the next page to make as many 
meaningful grammatical sentences as possible. For example, here is a group 
of ten words, 


MEN SKY IS FIGHT THAT THE SLOW BRIGHT OF FOR 


which can be recombined in the following sentences: 


Men fight for the sky. 
The sky is bright. 
The fight is slow. Ete. 


You will receive as much credit for a short sentence as for along one. Your 
sentences do not have to be artistic, but they must be grammatical. There 
must be at least a subject and a predicate. You will receive credit for a sen- 
tence which is only slightly different from another. A word from the group 
can be used only once in the same sentence, but it may be used any number of 
times in other sentences. Only use words from the group that you are examin- 
ing at the time. You may skip from one group to another, if you like. 


31 


32 C. P. Dougan, E. Schiff and L. Welch 


There are ten of these groups and you have only ten minutes in which to 
complete the test. Are there any questions? . . . Do not turn the page until 
the examiner says “Start.” y 

The following are the ten groups of Part 1: 


. Dog tree climbs runs those a smooth good by with 

City John built stood a that large strong of from 

Car fence travels was this that big cool for by 

Sea woman move could these the green rough with of 

Den lion ate is big deep these the of by 

House child left has blue frightened the a for by 

. Lemon wife cooks finds that soft round with from 

Potatoes maid cut once small hot these a of for 

. Fish boy waits catches the a long cold by from 

10. Slowly the golden light that rested upon them moved away 


SeneoRer 


Part 2. Instructions 


Make as many letters as possible using no more and no less than. three 
straight lines. For example, the letter A is made with three straight lines, two 
slanting downward and one across. You will be given no credit for the letter 
A, since it is an example. ; 

Make as many letters as possible, using no more and no less than two 
straight lines. 

ake as many letters as possible, using no more than one straight line and 
one semi-circle. 

The time limit is three minutes. 


Part 3. Instructions 


On the next page you will be given a list of twenty words which you are to 
connect into a story. You must be certain to use the words in the order in 
which they appear on the list. If the first word is “tree” in your story this 
must be the first word which appeared on the list, You must not skip any of 
the words. 

Your story must be grammatical and logically related. It must have a 
beginning ani l an end. You will be rated on the number of words you make 
use of in the time allotted, Write as fast as you can and underline each of the 
twenty words as you use it. 

he time limit is three minutes. 

The words used in this test were: 


STAIRS OCEAN CHEMISTRY SONG TEST MOUNTAIN BUBBLE DOG 


LEMON PICTURE POST BLANKET VIOLIN ARE 
STEAM LEG WINDOW SWAMP STAMP Bi) TM 


(The words were given in this order.) 


Part 4. Instructions 


The object of this test is to construct out of ten blocks on each trial as many 


pieces of furniture or home furnishings as possible. The pieces of furniture 
7 you 
beri must fit properly. It must be symmetrical sl be recognizable as a 
pae o prame Do not attempt to be futuristic. Use conventional forms. 
u ot use a minimum of two blocks to construct a piece of furniture. You 
eat ee of be many of the same type of furniture as you like. You will receive 

u ee oh f ae hes Spe anas Ae only slightl different from another. i 

com] ials. 
Hence, you have only two minutes for Soak ial a 


Originality Ratings of Store Display Personnel 33 


The blocks used in all five trials were geometric shapes selected from a box 
of playing blocks. On each trial the blocks were presented to the subject on a 
piece of cardboard with each shape outlined so that the posoan of the blocks 
were standardized. A record was kept of all of the combinations of blocks for 
which credit was given. 

2. Rating Scale. Each subject in the display department was rated on a 
five point scale by the manager and by the assistant manager of that depart- 
ment. These men rated their employees independently; however, it must be 
borne in mind that these two men, over a period of time, must have exchanged 
ideas as to the originality or creative thinking of their employees. 


Subjects 


In the present study the Reorganization Test was given to a total of 
33 employees in the display department of R. H. Macy’s. Their indi- 
vidual positions ranged in talents and included: artists, window, 
show-case and floor display men, designers, stylists, and the executives 
in charge. ; 


Results 


The test results of these 33 department store employees in the display 
department were compared with those of the three groups, 30 professional 
artists, 25 art majors, and 48 unselected students, reported in the previous 
study. The mean scores and standard deviations for each group on 
each part of the test are shown in Table 1. 


Table 1 


The Mean Performance Scores and the Standard Deviations 
for Each Group on Each Part of the Test 
EIS EIR an hel Pach a 2ST 


Professional Art Display 
Artists Majors Personnel Unselected Students 
N = 30 N = 25 N = 33 N = 48 

Parts Mean S.D. Mean 8.D. Mean 8,D. Mean 8.D. 
SSS SSS A 

1 17.7 7.2 21.9 7.6 10.8 6.8 18.0 4,2 

2 12.5 1.9 13.2 1.0 11.9 2.3 67 18 

3 11.4 41 7.3 2.5 6.8 4.0 |! 91 32 

4 18.4 78 13.9 9.1 14.0 10.5 34 27 

Total Score 60.5 123 56.4 151 43.1 18.1 37.6 70 


It will be seen that the display personnel compare almost equally 
with the unselected student in total score, whereas both are considerably 
lower than the art majors and professional artists. The only sub-test 
in which there is a striking difference of score is in Part 4, which is 
concerned with the construction of furniture with blocks. It is inter- 
esting to note, therefore, that a large part of the department store 


34 C. P. Dougan, E. Schiff and L. Welch 


personnel tested were employees of the furniture and interior decorating 
departments. 

The difference between professional artists, unselected students, and 
art majors has already been mentioned in the previous studies. 

All of the differences between sub-tests were put to test and some 
significant t-values were obtained. The t-values obtained for differences 
between the means of the four groups on each part of the test and be- 
tween the total score are presented in Table 2. 


Table 2 
The t-Values Obtained for Differences Between Means of Groups* 
Prof, Prof. Art Majors Display Display 
Artists Artists and and Per. and Pers. and 
and, Unselected Unselected Prof. Unselected 
Parts Art Majors Students Students Artists Students 
1 2.1 0.2 3.0 4.0 6.0 
2 1.5 18.2 17.1 “A 11.5 
3 44 1.1 2.4 4.5 2.8 
4 2.0 10.2 7.5 1.9 6.8 
Total 
Score 1.1 10.9 7.2 4.5 1.91 


* tos = 2.0; to = 2.3; tu = 2.6. 


It appears that, for the total test score, the difference between the 
display personnel and the professional artists is statistically significant, 
while that between the display personnel and the unselected students is 
not. Parts 2 and 4 of the test seem especially important. In all cases 
except between the display personnel and the unselected students, the 
differences between the groups on these two parts seem to be consistently 
significant. 

Tn order to determine the degree to which the test results agreed with 
the creative ratings given by the department managers, the test scores 
and the ratings were analyzed. A coefficient of .60 was obtained. 

lt Was considered that perhaps Part 4, alone, would be of high enough 
reliability for judging creative thinking. However, when the results of 
Part 4 were analyzed it was found that the P value for this sub-test when 
used singly was small enough to cast doubt on the hypothesis. 


Summary and Conclusions 


The purpose of this study was to measure originality of department 
store display Personnel. A special test was constructed by Welch in 
which the subjects were obliged to recombine familiar ideas according to & , 


Originality Ratings of Store Display Personnel 35 


series of four different patterns. The subjects were rated by their 
supervisors by means of a 5 point rating scale to provide performance data 
with which to correlate the Reorganization Test results. 

1. A contingency coefficient of .60 was obtained between the ratings 
given by the manager and the scores resulting from the Reorganization 
Test. 

2. The performance of the display personnel was compared with that 
of subjects examined in a previous study. The mean scores for the four 
groups are as follows: professional artists, 60.5; college art majors, 56.4; 
department store display personnel, 43.1; and unselected students, 37.6. 
The difference between the professional artist and the display personnel 
is statistically significant while that between the unselected student 
and the display personnel is not. However, the display personnel were 
superior to unselected students on two of the sub-tests. 

3. These results indicate that there is a possibility of measuring 
originality, as it would apply in the fields of advertising and display. 


Received July 12, 1948. 


References 


1, Welch, L. Recombination of ideas in creative thinking. J. appl. Psychol., 1946, 
30, 638-643. 

2. Welch, L., and Fisichelli, V. R. The ability of college art majors to recombine ideas 
in creative thinking. J. appl. Psychol., 1947, 31, 278-282. 


The Rosenzweig Picture-Frustration Study in the Selection 
of Department Store Section Managers 


H. Wallace Sinaiko 
L. Bamberger & Co., Newark, N. J. 


This report is an outgrowth of a study of certain intelligence and 
personality characteristics of Department Store Section Managers.' A 
battery consisting of two tests of mental ability and one measure of 
personality was administered to a group of 53 of 58 employed Section 
Managers. 

Findings with regard to the two intelligence tests were essentially 
negative: correlations between test scores and a quantitative rating of 
job performance were so low as to be chance deviations from zero. 
Similar treatment of scores from the personality test—the Rosenzweig 
Picture-Frustration Study—produced more fruitful results in terms of 
predicting job performance in Section Managing. 


Method 


The Instrument. The P-F Study consists of 24 cartoon-like pictures 
in booklet form. Each picture illustrates a frustrating situation involving 
two or more people. One figure is shown saying something about the 
situation while the caption box over the second person is blank. The 
subject is told to write the first reply that comes to his mind in the blank 
over the person being addressed in the picture. 

Six principal scores are derived from the P-F Study. Responses are 
categorized according to direction of aggression and type of reaction. 
“Direction” categories include the following: (1) Extrapunitiveness,— 
aggressions directed by the subject toward someone or something in the 
frustrating situation; (2) Intropunitiveness,—aggressions directed by the 


_ `A definition of the Job “Section Manager” aj in the Dictio: of Occupa- 
tional Titles, Part I, page 576, as follows: “M. og Floor; aisle man; Bieta: SE 
section; (retail trade) ; 0-75.10; supervises employees in a designated section of the selling 
floor; instructs new workers and sees that they follow store system in making sales; 
shifts selling personnel from one department to another so that service will be efficient 
and prompt; regulates lunch hours and grants permission for employees to leave the 
floor; handles returned goods, approves bank checks, and adjusts claims or refers them 
to the adjustment department; answers customers’ questions relative to merchandise or 
location of merchandise; floor-walker (almost obsolete).” 


36 


The Rosenzweig Picture-Frustration Study 37 


subject toward himself; (3) Impunitiveness,—the absence of aggressive 
feeling. “Types” of reaction are: (1) Obstacle-Dominance,—the prob- 
lem, or situation, is predominant in the subject’s response; (2) Ego- 
Defensive,—blame, or responsibility, is assigned for what has happened; 
(8) Need-Persistence,—a solution to the problem is mentioned. The 
scoring categories are symbolized by the letters E, I, and M, each corres- 
ponding to the “direction” of aggression. “Type” of reaction is signified 
by the use of the symbols O-D, E-D, and N-P. 

The Group. As mentioned above, 91% of the 58 Employed Section 
Managers comprised the experimental group. Breakdown of the group 
by sex was: 44 women and 9 men, 83% and 17% respectively. Length 
of time on the job ranged from three months to 16.7 years (median = 18 
months, Qı = 6.5 months, and Qs = 66 months). Formal education 
ranged from eight years to seventeen years. One-eighth of the group 
had not completed high school, 50% had completed one or more years of 
college, and 15% were college graduates. Age ranged from 21 to 57 years 
(Median = 30, Qı = 24, and Q; = 36). 

The Criterion. A quantitative measure of job performance was built 
with information obtained from Executive Personnel History forms. 
This is a modified linear rating scale used throughout the company in its 
semi-annual personnel review and rating of all executives. Executives, 
or Section Managers in this case, are rated on six basic qualities, each 
being subdivided into from two to ten categories. These qualities include 
Character, Intelligence, Intuition, Experience, Adaptability, and Special 
Skills. Ratings are made on the subdivisions under each main category. 
For example, Intuition is rated for each of two points: “Are decisions 
based on limited data usually correct?” and “Are decisions arrived at 
without undue delay?” 

Actual ratings assigned to the subdivisions are confined to the fol- 
lowing values: Outstanding, Above Average, Average Plus, Average, 
Below Average, and Unsatisfactory. 

Each Section Manager is rated by one of four Floor Superintendents. 
All ratings are checked by the Chairman of Personnel Reviews. Thus, 
there is a “common denominator” roughly operating to keep ratings by 
different supervisors comparable. All Section Manager ratings used were 
on a minimum of three months’ service on the job. 

: To convert the above descriptive ratings into quantitative terms 
arbitrary weights were assigned as follows: Outstanding, 11; Above 
Average, 9; Average, Plus 7; Average, 5; Below Average, 3; and Unsatis- 
factory, 1. Mean point values were computed for each of the six basic 
qualities weighted and summated. 

A seventh basic category on the Executive Personnel History form, 


38 H. Wallace Sinaiko 


“Placement and Development,” was treated slightly differently. The 
subdivisions, “Is he well placed on his present job?”, and “Is he satisfied 
with his present status?”, were weighted as follows: Yes, 10; Yes, qualified 
5; No, —10; and No, qualified —5. This weighted score was algebraically 
added to the summated averages of the preceding six rated qualities, 
This final figure gave us a quantitative criterion measure for each Section 
Manager. A frequency distribution of ratings for the entire group 


showed a range of approximately 50 points (24.6 to 74), a mean of 58.9, _ 


and a standard deviation of 9.6. 


Results 


Ratings of each Section Manager’s job performance and the six 
principal scoring categories of the P-F Study were correlated (Pearsonian 
product-moment r). Table 1 summarizes these relationships as well as 
those between P-F Study scores and length of service. 


Table 1 


Pearson Correlations between Rosenzweig Picture-Frustration Study Scores, Length 
of Service, and Job Ratings of 53 Department Store Section Managers 


Picture-Frustration Study Scores 


E I M 0-D E-D N-P 
Length of Service —.23 15 .39** Bol —.12 —.01 
Job Ratings —31* = .28* 25 —.02 —48** 38 
* Significant at the 5% level. 
** Significant at the 1% level. 
Discussion 


Length of Service. The distribution of this variable had marked 
skewness toward the right, or longer period of time on the job. The 
mean length of time in Section Managing was approximately 34 months 
while the median was only 18 months. Hence, there seems to be a fairly 
_ high rate of turnover, with only a small group of “long stays” in the job- 

_ One score on the P-F Study showed a statistically significant relation- 
ship to length of service: Tmpunitiveness. Thus, there is a tendency 
among the longer-staying Section Managers to show more M (mini- 
mizing, absence of blame-placing, conformity) in their responses than in 
the more recently hired of the group.? 


‘ eboney were run between age and P-F Study scores. One significant rela- 
onship, r = —.28, +.05, was found between Extrapunitiveness and this variable: 
All other correlations between age and P-F Scores approximated zero. 


The Rosenzweig Picture-Frustration Study 39 


Job Ratings. There were four statistically significant correlations 
between P-F Study scores and job ratings. Keeping in mind the re- 
quirements of a Section Manager’s duties, these relationships follow a 
logical pattern. There was a negative correlation between the criterion 
and E:r = —.31, + .05. Better Section Managers show relatively fewer 
extrapunitive, aggressive, responses. Management requires of its Section 
Managers a constant display of good-will in their customer contacts. 
A large number of these contacts occur under strained circumstances 
produced by such things as complaints about quality of merchandise, 
non-delivery, or service, etc. Section Managers must obviously refrain 
from any show of aggressiveness in handling these adjustments if they 
are to maintain customer friendship toward the store. 

The correlation between the criterion and I, r = .28, +.05, indicates 
that better Section Managers show a tendency to turn their aggressions 
against themselves. Intropunitiveness may be a necessary adjunct of 
efficient Section Managing. The somewhat hackneyed phrase, “The 
customer is always right,” is an attitude actually encouraged by Manage- 
ment. In other words, store policy regarding customer relations is itself 
an intropunitive one. 

M scores were correlated with the criterion to a positive, but statis- 
tically insignificant, degree: r =.25, +.10. There is a slight tendency 
for better Section Managers to avoid defining situations as conflictual 
and to see them as non-frustrating. 

The first of the P-F Study scores relating to type of reaction, O-D, 
showed a practically zero correlation with the criterion: r = .02, + .50. 

The highest vorrelation between P-F Study scores and job ratings 
was found between E-D and the criterion: r = — .48, + .01. Thus, 
low-rated Section Managers tended to be more defensive when con- 
fronted by the test situations; i.e. they were overly concerned with fixing 
responsibility, either in assuming blame themselves or in blaming some- 
one else. 

N-P scores on the P-F Study showed a moderate relationship with 
the criterion: r = .38, + .01. High-rated Section Managers tend to 
have an adaptive, or solution-seeking, attitude for dealing with every-day 
problem situations. 

Additional Statistical Data. Correlations were run between ratings 
and four variables, length of service, age, education, and sex. This was 
done to determine whether any of the reported relationships might be 
an artifact of one of these variables. Correlations were as follows: 
(1) between age and ratings: r = .19, + .15; (2) between length of 
service and ratings: r = .34 + .02; (3) between education and ratings: 
r = .25 + .06; (4) between sex and ratings: r = .27 + .04. Thus, the 


40 H. Wallace Sinaiko 


latter three variables, length of service, education, and sex, are related 
to job ratings to a statistically significant degree. Women tended to get 
higher ratings than men. 

P-F Study scores of the 15 highest-rated Section Managers are com- 
pared with scores of a like number of the lowest-rated in Table 2. The 
comparison of mean P-F Study scores of top-rated and bottom-rated 
Section Managers, shown in Table 2, confirms the earlier discussed cor- 
relational findings. However, the statistical significance of these differ- 
ences is greatly reduced by the small number of cases. In any event, 
differences do exist in the direction indicated by the overall correlations 
on the total group of 53 Section Managers. 


Table 2 


Comparison of P-F Study Mean Scores* of the 15 Highest-Rated Section 
Managers and the 15 Lowest-Rated Section Managers 


Scoring Categories 
E I M 0-D E-D N-P 
Manne ENE SAREE e A E aart T A a 
Highest-Rated 
Mean 37.1 32.5 30.6 18.1 48.5 33.6 
Sigma 17.05 7.5 11.8 7.2 8.2 10.9 
Lowest-Rated 
Mean 43.4 29.6 27.0 17.4 54.6 28.1 
Sigma 14.7 7.6 9.6 5.9 9.7 12.0 
t 1.07 1.15 .90 .26 1.83 1.28 
p 28 24 36 50 .06 20 


x Values for each category represent the proportion of the total number of responses 
made in the test falling in that category. 


Table 3 compares quartiles of P-F Study scores of the 15 highest- 
rated and 15 lowest-rated Section Managers. That there is a great deal 
of overlap between the top-rated and bottom-rated Section Managers’ 
scores is apparent. Thus, the P-F Study is not a highly valid selection 
device by any means, although tendencies do seem to be indicated 
insofar as performance in Section Managing is concerned. 

A further check on the efficiency of the P-F Study with the present 
occupational group was made by using a ‘‘combined P-F index.”* The 
index was built by adding the number of I, M, and N-P responses made 
by each Section Manager, and then subtracting the number of E and 
E-D responses. In this way a simple algebraic expression, which could 


* This index was suggested to the writer by Dr. H. G. Gough, Department of Psy- 
chology, University of Minnesota, in a personal communication. 


The Rosenzweig Picture-Frustration Study 41 


Table 3 


Comparison of P-F Study Quartiles* for 15 High-rated 
and 15 Low-rated Section Managers 


Scoring Categories 
E I M 0-D E-D N-P 
Qı Md Qs Qi Md Qs Qi Md Qs Qı Md Qs Qi Md Qs Qi Md Q: 


High-rated 21 34 48 23 34 44 17 32 40 12 19 23 41 50 54 22 33 44 
Group 4 


Low-rated 32 43 52 24 29 35 19 26 35 10 18 23 52 54 66 23 26 35 
Group 


* Values for each category represent the proportion of the total number of responses 
made in the test falling in that category. 


be either positive or negative, was derived for each of the top-rated 15 
Section Managers and each of the bottom-rated 15. A comparison of 
indexes thus obtained on each of the two groups of Section Managers 
is shown in Table 4. 

If a cutting score were to be established at plus 2 we would eliminate 
5 of the top-rated 15 Section Managers and 11 of the bottom-rated 15. 


Table 4 


Comparison of Combined P-F Study Indexes of 15 Highest- 
Rated and 15 Lowest-Rated Section Managers 


Indexes 
Case 
High-Rated Low-Rated 
1 16.0 15.0 
2 15.5 12.0 
3 12.5 7.5 
4 12.0 6.0 
5 12.0 0.0 
6 9.0 —1.5 
7 6.0 —2.0 
8 4.5 —4.5 
9 4.0 —6.0 
10 2.5 —6.5 
11 —4.0 —6.5 
12 —7.5 —12.5 
13 —10.0 —13.0 
14 —13.5 —17.0 


15 —18.5 —27.0 


42 H. Wallace Sinaiko 


The use of a simple index, such as that described here, corroborates the 
discussion of Table 3. Thus, the P-F Study is far from being a highly 
valid selection tool although it does warrant some consideration in the 
hiring of Department Store Section Managers. 


Summary 


1. The Rosenzweig Picture-Frustration Study was administered to 
53 Department Store Section Managers. Quantitative measures of job 
efficiency were built from personnel review data and correlated with 
each of the six principal scores derived from the P-F Study. 

2. Statistically significant negative relationships occurred between the 
criterion and scores for Extrapunitiveness, and between the criterion and 
Ego-Defensive scores. Positive, statistically significant relationships be- 
tween the criterion and Intropunitiveness, and between the criterion 
and Need-Persistive scores were found. A positive, but not significant, 
correlation was found between the criterion and Impunitiveness. A near- 
zero relationship existed between job ratings and Obstacle-Dominance 
scores on the P-F Study. 

3. A simple technique of combining P-F scores into an index would 
admit 10 out of 15 top-rated Section Managers and would reject 11 out 
of 15 bottom-rated Section Managers if a cutting score of plus 2 was used. 

4. This investigation suggests that the Rosenzweig Picture-Frustration 
Study measures factors which are associated with occupational success 
as a Section Manager, and which might have value in an employment 
selection program. 


Received July 6, 1948. 


a 


The Rorschach as a Predictor of Academic Success 


Boyd Rowden McCandless 
Ohio State University 


Many studies have been made, and many claims advanced for the 
Rorschach as a highly useful test in the area of academic prediction. 
The thinking behind the studies is perhaps best summarized in Klopfer 
and Kelly (3, p. 266): 


_ “Tf the Rorschach method could do nothing else but estimate the intellectual 
level of the subject as well as the usual intelligence tests, these tests would be 
Preferable since they are simpler to apply. The importance of the Rorschach 
method for the intellectual aspect of personality diagnosis lies in Bamesiiag 

which no intelligence test attempts, the differentiation between potential 
city and actual efficiency.” 


Beck (1), Rappaport by implication (5) and Munroe (4) in a careful 


"experimental study of college women concur in such an estimate of the 
Rorschach test as a measure of intelligence and a predictor of success. 


Munroe (4) hasewerked out a 28 item check list, usable with either 
‘the group or individually administered Rorschach. It is filled in by a 


_ Protocol inspection method, and general adjustment has been found by 
‘ her, working with women students at Sarah Lawrence, to correlate 


= 


‘negatively with number of checksaccumulated by thesubject. In general, 
girls with fewer than 10 checks were reasonably adequately adjusted; girls 
‘With more than 10, moderately to seriously maladjusted (4; p. 66). The 
“Tnspection Rorschach” adjustment rating predicted academic success 
Somewhat better than did ACE percentile ratings, coefficients of con- 
tingency .43 and .36 for 348 subjects; corrected, .49 and .39 for the two 
tests respectively, (4, p. 76). 
Beek (1) has devised an organization, or Z score, to be derived, essen- 
tially, from individually administered Rorschach tests. To quote: 


+. the sum of all the Z scores in any Rorschach record is the measure of 
B's Beaton activity. These totals vary directly as the intelligence of 8. 
The factor has certain virtues not inherent in W. For one thing, it takes 
aa of much activity that W misses. Second, since it is not scored in 
ue te units, as is necessary in the case of W, it makes it possible to take 
Account of intermediate values and continuous distributions, and is thus a more 
flexible measure. Third, it is an index of the intellectual energy as such, ir- 
woe of the kind of intelligence that S uses, something that does influence 
W° Thus Z is a more accurate representative of the intelligence functioning 
‘Per se. . . . it is therefore an index of thinking power. Its essence is the 
capacity to grasp relations not perceived by others (1, p. 12). 


43 


44 Boyd. Rowden McCandless 


The three authors, Beck (1), Klopfer and Kelly (3) and Rappaport 
(5), less directly than the first two, assign predictive values in a general 
fashion to many categories of the Rorschach. Munroe (4), as stated, 
has done so specifically and empirically. 

Of the score for number of whole responses to blots produced, Beck 
says: “The higher the intelligence potential of an individual, the more 
W he can produce” (1, p. 10). Klopfer and Kelly state: “. . . (W) 
represents an emphasis on the abstract forms of thinking and the higher 
forms of mental activity” (3, p. 259). Both qualify the quantitative 
use of this W score, stating that the quality of W must be considered. 

Of large detail (D) and small detailed responses to the individual 
cards, Beck states that where emphasis is on D there is revealed “a person 
who attends to obvious and practical interests” where Dd shows an 
“evidence of some need to pursue too much the elements that most people 
disregard”; and emphasis on W “is the sign of an over-all thinker” 
(1, p. 14). Klopfer and Kelly believe that the individual with approxi- 
mately 24 of his responses listed as D and Dd “has enough common sense 
to use the most obvious material before he starts seeking the unusual” 
(3, p. 260). 

Summarizing the thinking of the various authors on other categories, 
with perhaps some injustice, it appears to be, from the point of view of 
making predictions of efficiency: 

Animal (A) responses tend to indicate a certain amount of conformity 
of thought, too few indicating unusual thought processes, too many 
barrenness, lack of creativity and stereotypy. 

Popular responses have roughly the same meaning. 

Percentage of responses made on the basis of form alone indicate in a 
general way an intellectual, unemotional approach to life; the percentage 
of form responses at a superior level is directly related to functioning 
intelligence. i 

Human movement responses betoken creative imaginativeness, with 
qualifications set on their location and type. 

Responses dominated by color, but with form present (CF), betoken 
an adjustment intermediate between infantile and fully, socially adult, 
as far as emotional control is concerned. Responses dominated by form 
but using color are given by the emotionally fairly rich but controlled, 
mature person. 

Vista or perspective responses (FV or FK) are used by persons who 
are self-critical and liable to “inferiority feelings.” Flat grays (FY or 


FC’) used in responses to the cards indicate anxiety and reduction of 
intellectual energy. 


` The Rorschach as Predictor of Academic Success 45 


The broader the subject’s interests and the richer his educational 
back-ground and the higher his intelligence, the greater the variety of 
things he will see. 

In general, the more intelligent and the less anxious the subject is, 
the more complete human figures he will see; and the more whole human 
and animal with relation to detailed human and animal figures there 
will be in his record. Finally, the more intelligent he is, the larger the 
total number of responses he will give. Klopfer and Kelly (3, p. 208), 
however, do not agree fully with this. The seeing of things in the white 
spaces tend to betoken resistive, persistive and unusual methods of 
approach. 

It is with these elements of the Rorschach that the author has con- 
cerned himself in this study. He realizes most clearly that the Rorschach 
is essentially a configurative test, where a pattern of factors must be 
taken account of to make any really adequate interpretation or prediction. 
On the other hand, he feels that the more checks made on the predictive 
efficiency of the specific categories of the Rorschach for which predictive 
efficiency has been claimed, the more valid is the use of the test for such 
purposes of prediction. In the case of this study, this prediction is in 
the areas of academic progress and achievement. 


Subjects and Method 


Individual Rorschach’s were given in conjunction with vocational 

guidance, during the writer’s assignment as Selection and Classification 
Officer to the U. S. Maritime Service Officers School, Alameda, California 
to approximately two hundred Officer Candidates. These men were 
aspiring for marine licenses and commissions in the U. S. Maritime 
Service, and undergoing a four months’ period of training pursuant to 
that end. Every subject who could be matched on the eight criteria 
used was selected for this study.’ 
__ These men were “normal” in that they were functioning adequately 
in a wartime society, contributing to the war effort, making in general 
adequately and highly motivated progress toward their specific goal, and 
Were in no case undergoing psychiatric treatment. 

‘Thirteen pairs of men were matched on the basis of AGCT score; 
average Mechanical Comprehension Test score; average Iowa Silent 
Reading comprehension test score (form Am, new, advanced); average 
Stanford Advanced Arithmetic Reasoning Test score; average age and 
amount of education; marital status. (six married, two divorced, five 
Single in each group); and enrollment in division of the school (ten 


1 This study reflects the author’s conclusions and is not an official Maritime Service 
publication, 


46 Boyd Rowden McCandless 


members of each group were enrolled in Deck training, three in Engine 
training). 

The basis of differentiation was in terms of the academic grade 
averages. With a value of 5.0 assigned to grade A and 1.0 assigned to 
grade F, the high grade point group averaged 4.7 ranging from 4.5 to 
5.0; and the low grade point group struggled through the school with 
average grades of 2.9, ranging from 1.0 to 3.6. Some, indeed, of the low 
grade point group failed to qualify academically for their licenses and 
commissions. 


Table 1 
Quantitative Characteristics and Differences between 
High and Low Grade Point Groups 

> aes igh Grade Low Grade Significant 

Characteristic oint Point Dif. tof Diff. at % level 
AGCT 185.7 - 135.7 0.0 0.000 Greater than 5 
MCT i 141.5 139.4 2.1 0.430 Greater tban 5 
Arithmetic Equated Score 94.1 89.5 4.6 1.367 Greater than 5 
Reading Standard Score 103.9 99.4 4.5 1.213 Greater than 5 
Age (years) 25.6 24.4 1.2 0.679 Greater than 5 
Education (years) 12.3 12.1 0.2 0.605 Greater than 5 


It will be noted from Table 1 that these men are very superior, 
psychometrically speaking. The mean AGCT score for the rank and 
file was set at 100, with a S. D. of 20 points. The average for this group 
was 1.75 S. D. above the national mean, 

The groups average 2 S. D. above the mean on mechanical compre- 
hension, as measured; and in math and reading comprehension, approach 
the average third year college man. 


i nare ranges for the equating scores are given in the following tabu- 
ation: 


Variable High Grade Point Low Grade Point 
AGCT 124-149 124-150 
MCT 123-161 119-161 
Math 67-103 65-105 
Reading 91-113 93-120 
Age o 19-87 19-32 
Education 10-15 10-14 


me Between these two groups of men, so similar in quantitative character- 
istics, 80 different in academic success, the problem was to distinguish, 
if possible, personality characteristics which might explain the efficiency 
differences. 


Their Rorschach’s were studied intensively in an effort 
_ Their R to find such 
distinguishing characteristics. 


=< 


ne 


The Rorschach as Predictor of Academic Success 47 


Results 


The results of the present study were negative, with one exception. 
Table 2 summarizes the averages for the high and the low grade point 
groups. Differences in the direction of the low grade point group are 
indicated by a minus (—) sign; t (Edwards (2)) is given in the fourth 
column and the level of significance of the t in the fifth column. 

It will be noted that t approaches the one per cent level of confidence 
in only one case,—mean number of popular responses. Even here, the 

“difference is of little practical significance (8.1 versus 6.6 mean popular 
responses for the respective groups). 

" Favoring the high grade point group with t’s above 1.0, are found for: 
“Mean number large detail responses; Mean number tiny detail responses; 
Mean number space responses; Mean number human movement re- 
sponses; Mean number pure form responses; Mean number superior pure 
form responses; Mean number animal responses; Mean number human 

; ‘detail plus animal detail responses; Mean number popular responses; and i 
"Mean quality of whole responses. 
 Fayoring the low grade point group with t’s above 1.0, are found for: 
Mean number whole responses; Mean number achromatic color re- 
_ sponses (both including and excluding texture); and Mean number of 
color-form responses. 
It has been considered by most of the authors working with the 
Rorschach that the ratio of W (whole blot) responses to the number of 

‘(human movement) responses is one of the best of the predictive 
factors for “efficiency” or productivity, with a ratio of 3 to 1 being con- 
sidered optimal. Ratios falling materially below 3.0 are considered to 
characterize “‘underproductive” persons; ratios falling materially above 
8.0 are considered to characterize “over-striving” persons, whose perform- 
ance is likely to be describable as “quantity” rather than “quality.” 
These latter may produce much, be over-ambitious, under considerable 

strain; and their products are likely to be superficially acceptable rather 
than really good. 
If such considerations hold for these two groups, we should expect 
the high grade point.men to have a mean ratio approximating 3.0, which, 
if deviant, would probably be expected to be above 3.0; the low grade 
Point group would be predicted to show a mean ratio falling below 3.0. 
AS can be seen from Table 2, the opposite is true, the high grade point 
men showing a mean ratio of 1.6; the low grade point men a ratio of 3.4. 
The difference, however, is not a statistically significant one. 

_ Beck’s Z or organization score (a measure of the “capacity to grasp 
Telations not perceived by others” (1, p. 12)) differentiated even less 
effectively than the conventional Rorschach categories discussed above. 


48 Boyd Rowden McCandless 


Table 2 


Selected Rorschach Differences and their Significance 
for High and Low Grade Point Groups 


High Low 
Grade Grade % Level of 
Mn. for Category Point Point Diff.  t for Diff. Confidence 
N Responses 39.4 32.6 6.8 0.916 Greater than 5 
N Whole R’s! 6.4 10.0 —3.6 1.532 Greater than 5 
N Detail R’st 26.1 19.8 6.3 1.284 Greater than 5 
N Tiny Det. R’st 6.1 2.7 3.4 1,183 Greater than 5 
N Main and Additional 
Space R’st 11.3 8.4 2.9 1.029 Greater than 5 
N Human Mov’t R’s 5.8 3.5 2.3 1.257 Greater than 5 
N Animal Moy’t R’s? 4.2 3.2 1.0 0.957 Greater than 5 
N Inanimate Nov’t R’s? 18 Ef: 0.1 0.121 Greater than 5 
N Vista R’s 1.9 1.8 0.1 0.146 Greater than 5 
N Form R’s 18.8 13.1 5.7 1.528 Greater than 5 
N Superior Pure Form ; 
R’s! 13.8 10.3 3.5 1,791 Greater than 5 
N Superior of Total 
R’s in form! 30.4 25.4 5.0 0.935 Greater than 5 
N Achromatic Color R’s* 2.2 48 —2.6 1.525 Greater than 5 
N Achromatic Color R’s* 3.2 4.9 —1.7 1.068 Greater than 5 
N Form-texture R’s? 3.0 2.3 0.7 0.321 Greater than 5 
N Form-color R’s* 3.2 8.4 —0.2 0.119 Greater than 5 
N Color-form R’s 1.5 2.5 —1.0 1.308 Greater than 5 
Mn, Sum Color? 3.4 45 -21 0.764 Greater than 5 
Ni TEER Lis ap 2i 0.9 0.956 Greater than ‘ 
i 12.1 i 7 
N Human Detail + 3.4 1.555 Greater than 
Animal Detail R's 10.8 4.0 6.8 1.789 Greater than 5 
N Anatomy R's 2.5 2.9 0.4 0.610 Greater than 5 
N Popular R’s! 8.1 6.6 1.5 2,836 Less than 5 (1) 
N Response Categories 11.3 10.6 0.7 0.741 Greater than 5 
Z Score’ 433 485 -52 0.409 Greater than 5 
N Checks Munroe il4 121 0.7 0.359 Greater than 5 
y by po Wholes? 18 24  -06 0.556 Greater than 5 
Ma ole nly an 17 0.3 1.863 Greater than 5 
ee vt Railon d 3.4 1.8 0.691 Greater than 5 
Human + Animal R's: : 
Human Detail + An- 
BESEM 27 58 -31 0.591  Greator than 5 
Color Ratio’ 2.0 1.1 0.9 0.408 Greater than 5 


E o 09 008 L Greater than 5 
* After Beck’s (1) criteria, 


2 After Klopfer’s and Kelly’s (3) criteria. 
* After Rappaport’s (5) criteria, 


‘There 
a ae too N, pure color, or texture-form or pure texture responses to compute 


* Based on 11 pairs, due to zero in numerator of 2 ratios. t was computed on these 


ratios as with N’s, since the various Rorschach x 3 
a unit or entity. ; authors seem to regard the relationship a5 


Ta 


N 


The Rorschach as Predictor of Academic Success 49 


What slight, statistically non-significant discrimination it did make was 
in the wrong direction; mean Z score for the high grade point men was 
43.3, with a range from 9 to 99.5; for the low grade point men, mean Z 
score was 48.5 range 6 to 122.5. t was .409 for this difference. 

‘Asa final check, Munroe’s (4) check sheet, which gave positive results 
for the Sarah Lawrence students, was filled out for each man. Here 
the small difference was shown in the right direction (high grade point 
men averaged 11.4 checks; low grade point men 12.1 checks). The range 
was wider, however, for the former group (4-28) than for the latter 
(420). The men would appear to be seriously maladjusted, also, 
according to Munroe’s findings, who considers ten checks as a cutting 
score (4, p. 66). Her students were not given the individual Rorschach, 
which may account for the greater number of checks earned by this group. 


Discussion 


Despite the consistently negative results of this investigation, certain 
trends appear according to prediction. In general, the high grade point 
men are seen to be slightly more controlled emotionally or with less 
emotion to control, slightly more productive; on most criteria, slightly 
less anxious. They tend to show up with higher averages, even when the 
factor of their higher productivity is cancelled out, in the scores which 
indicate conformity (except for space responses), and appear, although 
Not significantly, better able to attend to the large, usual; and the tiny, 
unusual details of the Rorschach blots. If one can generalize from such 
a tendency, it might be said that such a solid, conforming, non-theoreticah 
Approach is one of the bases for academic success, particularly in a “eram” 
type of program such as the Officer Candidate programs tended to be. 
The only significant difference (more popular responses for the successful 
students) fits this trend. 

The author does not feel that the findings of this paper detract from 
the clinical use of the test; but, he believes it essential that many such 
checks as this be made. Finally, he grants the extreme difficulty of the 
task to which the Rorschach has been set in this case (restricted range 
and high level of ability, possible similarity of personality due to choice 
of occupation, small number of cases, etc.). Many authors, however, 
SR to have taken it for granted that the task could easily be accom- 

ished. 

Finally, other patterns, or combinations of the factors discussed above, 
or some total scoring, weighting system other than Munroe’s could con- 
ceivably be found to make a clear differentiation between these groups 
of men who differed so significantly in performance in the highly moti- 


50 5 Boyd Rowden McCandless 


vated Officer Candidate situation. The author’s repeated scrutiny of 
the tests has failed, however, to reveal such patterns. 


Summary 


Two matched groups of Officer Candidates, U. S. Maritime Service, 
who differed widely in academic achievement in a highly motivated, 
wartime, officer training program, were given individual Rorschach’s 
with the following results; À 

1. An analysis of the conventional Rorschach categories failed to 
demonstrate any important statistically significant differences, although 
trends appeared. 

2. Munroe’s (4) check list which discriminated good from poor stu- 
dents at Sarah Lawrence college failed to show differences in this group. 

3. Beck’s (1) Z or organization score also failed to make discrimina- 
tions. In fact the latter showed slight mean differences in a direction 
opposite to expectations. The statistically non-significant, but consistent 
trends were toward more emotional control, more conformingness, less 
anxiety on most criteria, more attention to concrete details, and slightly 
greater productivity for the high grade point men. 


Received July 12, 1948. 


References 


` 1. Beck, 8. J. Rorschach’s Test, II. New York: Grune and Stratton, 1945, Pp. xii 
+ 402. 

2. Edwards, A. L. Statistical analysis. New York: Rinehart and Company Inc., 
1946, Pp. xviii + 360. 

8. Klopfer, B., and Kelly, D. MeG. The Rorschach technique. New York: World 
Book Company, 1942. Pp! x + 436. 

4. Munroe, R. L. Predictions of the adjustment and academic performance of college 
students by a modification of the Rorschach method. Stanford University: Stanford 
KENE? Press, 1945. No. 7 of the Applied Psychological Monograph. Pp. 

5. Rappaport, D., Gill, M., and Schafer, R. Diagnostic psychological testing. Chicago: 
Year Book Publishers, Inc., 1946. Pp. xi + 516 (Vol. II). 


y 


' The OL Key of the Strong Vocational Interest Blank for Men 
and Scholastic Success at College Freshmen Level * 


Stanley R. Ostrom 


Department of Public Instruction, Dover, Delaware 


Psychologists have developed instruments that measure abilities and 
“aptitudes with a fair degree of accuracy. The use of these instruments 
for prediction purposes in learning situations has not proved as successful 
as one might hope, however. This may be due, to some degree, to non- 
¿intellectual traits which cause some individuals to persevere through 
discouragements while others of apparently equal potential fail. The 
Measurement of these traits has proved most elusive. 
Counselors using the Strong Vocational Interest Blank for Men have 
to a large degree assumed that the Occupational Level key of the Strong 
blank is one approach to this problem. This position is verbalized by 
_ Darley (2, pp. 66): 
i Clinical experience together with limited experimental data would indicate 
_ that the lowest occupational level scores on the revised blank will accompany 
‘the interest type previously defined as “lower level jobs.” Furthermore, an 
excessively low occupational level score seems at present to be associated with 
lack of “staying power” or “survival power” in college competition. This 
hypothesis should be tested as quickly as research data accumulate, by careful 
" studies of matched groups, since it is a phase of the “level of aspiration” and 
general motivational problem. j 


Strong holds the same position stating “Men with high OL scores have 
the interests of business executives and professional men, but those with 
low scores have the interests of workmen” (5, pp. 195). He further 

= Suggests that the key has value for a counselor helping a student plan 
his high school or college training program (5, pp. 203-204). 

Specific statistical studies for the corroboration of these hypotheses 
are, however, very meager. Berdie (1) reports a correlation of only .03 
between the OL key and academic achievement of forty-three college 
Students. He also found an equally low correlation, .01, when he com- 
pared the OL scores with curricular satisfaction. 

* The author wishes to acknowledge the aid and advice of Dr. Milton E. Hahn in 
planning the study on which this article is based. Special credit should be given Dr. 
William Kendall, Dr. Maurice Troyer, Dr. C. Robert Pace and Dr. Eric Gardner for 
their help in executing and interpreting the results of the research. The author’s 
Doctor’s thesis, from which the study is taken, is on file at Syracuse University. 


51 


F 


52 A Stanley R. Ostrom 


Kendall (3), on the other hand, obtained positive results when he 
studied 300 male college freshmen in Syracuse University. He found 
that when academic ability as measured by the Ohio State Psychological 
Examination, Form 21, was held constant three groups distinguished by 
differing levels of OL were found to. differ in college achievement. His 
three groups consisted of 100 men each of high, average, and low OL. 
The difference between these groups when adjusted for ability by co- 
variance proved significant beyond the five per cent level but not to the 
one per cent level of confidence. Kendall concluded “if used with 
caution OL scores at the extremes of the distirbution should be helpful 
to the counselor in making judgments concerning individual chances for 
scholastic success.” 

These studies give impetus to the need for further research as sug- 
gested in the last sentence of the statement by Darley referred to above. 

To test further the above hypothesis the writer conducted a study 
in which an attempt was made to determine the relationship between the 
OL key of the Strong Blank and scholastic achievement at three levels of 
education. The following discussion is a report of the findings at the 
college freshman level. 

As is the case each year, the 1946-1947 freshman class at Syracuse 
University participated in a testing program shortly after enrolling in 
school. Among other tests taken by the men were the Ohio State 
Psychological Test, Form 21 and the Strong Vocational Interest Blank 
for Men. From these test data six groups of seventy-five men each were 
chosen according to the following criteria: 

High level, high ability: Men whosé OL scores were equal to a standard 
score of fifty-seven or above and whose raw scores on the Ohio State Psychologi- 
cal Examination, Form 21 were ninety and above. 

Average level, high ability: Men whose OL scores were between standard 
Scores of forty-seven and fifty-two, and whose raw scores on the Ohio State 
Psychological Examination, Form 21 were ninety and above. 

Low level, high ability: Men whose OL Scores were equal to a standard score 


of forty-five and below, and whose raw scores on the Ohio State Psychological 
Examination, Form 21 were ninety and above. 


High level, low ability: Men whose OL Scores were equal to a standard 


score of between forty-seven and fifty-two, and whose raw scores on the Ohio 
State Psychological Examination, Form 21 were below ninety. 


Low level, low ability: Men whose OL Scores were equal to a standard score 


of forty-five and below, and whose Taw Si i ical 
Examination, Form 21 were below A a oo. T a 


Findings 
The mean honor point ratios were determined for each of the six 
groups. From Table 1, it can be seen that an even step progression from 


OL Key of Strong Vocational Interest Blank 53 


low to high OL and from low to high ability emerged except in one 
instance, that of average to high OL in the low academic group. 


Table 1 


Average Honor Point Ratios for Six Groups of Syracuse 
University Male Freshmen (Total = 450) 


Mean Honor Point Ratios 


High OL Average OL Low OL 
High Ohio 1.742 1.569 1.357 
Low Ohio 1.058 1.194 1.036 


These data were then subjected to analysis of variance. Table 2 
shows F-ratios for both OL and academic aptitude at magnitudes great 
enough to justify the rejection of the Null Hypothesis at the one per cent 
level of confidence. 


Table 2 


Analysis of Variance: Multiple Classification for 450 
Syracuse University Male Freshmen 
(Determining Effects of Ability and Level) 


Source Degree Sum 

of oi of Mean Test of 
Variance Freedom Squares Squares Er Hypothesis** 
Ability 1 238,496 238,496 72.23 Reject * 
Level 2 37,971 18,985 5.75 Reject 
Interaction 2 28,323 14,162 4.27 Bon 
Residual 444 1,467,232 3,302 mite 

Total 449 1,772,022 


* Where F = greater mean square/lesser mean square. By referring to Snedecor’s 
tables of F (4, pp. 222-225), we may use the following three rules in testing the hypothe- 
sis: (a) reject the hypothesis tested, if the calculated value of F is greater than the 1% 
Point given in the tables; (b) accept the hypothesis tested, if the calculated value of F is 
less than the 5% point given in the tables; (c) remain in doubt, if the calculated value of 
F lies between the 5% and 1% points given in the tables. 

** The Hypothesis tested is a null hypothesis concerning the difference between 
means of groups, i.e., there is no significant difference between the means of groups. 
(The 1% point necessary for rejection of the Null Hypothesis was 6.70 for ability and 
4.66 for level.) 


Conclusions and Recommendations 


1. A very significant relationship was established between honor point 
Tatio and both academic aptitude and OL in the Syracuse University 


54 Stanley R. Ostrom 


freshmen sample. This result strengthens Kendall’s study and gives a 
strong case to the use of OL scores in prediction of college success. It 
does not, of course, justify the use of the key as a single measure of motiva- 
tion, but it does point up its rightful place in a predictive battery. 

2. Standardization of OL on a school population. The Occupational 
Level Key of the Strong Blank was standardized by contrasting ‘un- 
skilled men” and “‘business and professional men earning $2,500 and up- 
wards a year” (5, pp. 185). An obvious result of using such a scale on a 
college population is the large number of high OL scores among college 
students. Finding men from the freshman class for the two low OL 
groups was extremely difficult. So difficult, in fact, that it was necessary 
to include men with scaled scores of forty-five to assure groups of seventy- 
five. Setting up an OL key standardized on college groups would un- 
doubtedly result in a sharper instrument. 

3. Follow-up study of college freshmen group. Repeating the college 
freshmen study four years after the original study will be revealing if the 
four year college honor point ratios are available for each group. 

4. Study of the high OL-low ability college freshmen. No reason is 
available to explain the sharp drop in mean honor point ratio between 
the average OL and high OL groups of low ability. An intensive study 
of a generous portion of this group to find answers for this deviation from 
the expected pattern is recommended. 


Received July 21, 1948. 
References 
1, Berdie, R. F. Prediction of college satisfaction and achievement. J. appl. Psychol., 
* 1944, 28, 239-245. 


2. Darley, J. G. Clinical aspects and interpretation of the Strong Vocational Interest 
Blank. New York: The Psychological Corporation. 1941. 

3. Kendall, W. E. The occupational level scale of the Strong Vocational Interest 
Blank for men. J. appl. Psychol., 1947, 31, 283-287. 

4, Snedecor, G. W. Statistical methods. Ames, Iowa: Collegiate Press Inc., 1946. 

5. Strong, E. K. Vocational interests of men and women. Stanford, California: Stan- 
ford University Press, 1943. 


Bua 


ta Note On the Shifts of Interest with Age 


E. L. Thorndike 
Professor Emeritus, Columbia University 


1 Thirty-seven men, all graduate students of education, ranging in age 
from 23 to over 40, reported, as well as they could estimate, the relative 
strength of the following tendencies, each for himself at the present time, 
and for himself at the age of 12: Approval (having people look up to 
ne you); Mastery (being boss); Kindliness (seeing people happy); Gregari- 
7, ousness' (being with one’s own crowd); Studying things; Studying people; 
and Studying abstractions. 
These men had been studying educational psychology and had a 
certain common basis for their definitions of the above. Doubtless, 
“however, the terms did not mean quite the same things to the different 
i ‘individuals, and it would probably be impossible to define with precision 
¿jüst what they did mean to the average of the group. Within limits, 
= however, these terms do have a community of meaning to them and to 
= the readers of this note. ‘The change from 12 to adult age (around 30 
~ in the case of this group) was: a loss of 214 steps for Approval; a loss of 
1% steps for Mastery; a gain of 14 step for Kindliness; a loss of 2 steps 
_ for Gregariousness; a loss of 114 steps for Studying things; a gain of 344 
‘steps for Studying people; and a gain of 214 steps for Studying abstrac- 
tions. For a group of lawyers, or doctors, or engineers, or business men, 
_ the shifts with age might well be different. 
These facts seem worth noting, especially the different effect of age 
= upon the interest in studying things as compared with studying people 
and abstractions, and the absence of any substantial change in kindliness. 
_ According to traditional fiction, a boy of twelve is brutal and careless of 
others. 
_ These same records can be studied from the point of view of the per- 
“Manence of the tendencies as reported. Assuming the validity of the 
_ testimony, the facts show that a person’s nature at 12 is prophetic of 
a ‘his nature in adult years in this respect (the median correlation for the 
37 cases is +.55). The child to whom approval is more cherished than 
Mastery is likely to become a man who seeks applause rather than power, 
k and similarly throughout. The effect of chance errors, forgetfulness, 
and the like, is to make this correlation too low. The effect of a constant 
= error whereby a person projected his opinion of himself to form his 
opinion of his own past would be to make the relation closer than it 
4 really was. The net result of eliminating these errors would, I con- 
jecture, be to raise the correlations somewhat. 
Received June 14, 1948. 
1This perhaps would be more suitably named “a mixture of gregariousness and 
sociability.” 
55 


A Fallacy in the Use of Median Scale 
Values in Employee Check Lists 


Clifford E. Jurgensen 
Minneapolis Gas Light Company 


Several investigators (1, 2, 4) have published articles using the 
Thurstone equal-appearing intervals method, or a slightly modified form 
of the method, to select and weight items in a check list to be used for 
rating employees. The author has developed similar unpublished check 
lists and is familiar with a number of other unpublished scales developed 
for or by various companies. It thus appears that the procedure is 
sufficiently used to warrant mentioning a fallacy which appears when 
the equal-appearing interval method is used in an industrial merit 
rating scale. 

Briefly, the method consists of obtaining a large number of state- 
ments which relate to good or poor job performance. Statements are 
printed separately on cards which are then sorted by a large number of 
judges according to the method of equal-appearing intervals. In some 
cases statements are printed serially and judged by encircling a number 
from 1 to 9 preceding each statement, this procedure having been shown 
(5) to give the same results as sorting. The median and semi-inter- 
quartile range for each statement is computed by formula or by nomo- 
graph (3). Statements with a large semi-interquartile range are elimi- 
nated, and the remaining items form a pool from which scale items are 
selected in such manner that statements differ in scale value by approxi- 
mately equal differences. A tentatively selected scale is used experi- 
mentally, tests of item relevancy are made, and the scale is modified 
where necessary. The final scale is used by asking raters to check items 
which describe or apply to the employee being rated. The “score” is 
the median or mean scale value of the checked statements. 

The scaled statement technique assumes that all items form a single 
continuum which is factorially pure. This assumption has not even been 
loosely approximated in any employee merit check list seen by the author. 
The typical employee check list contains items dealing with work output, 
quality, learning ability, job skills, personality, work habits, and many 
other types of items. Customary tests of item relevancy are generally 
applied to statements in such check lists, but these tests eliminate only 
those items which have a low or negative correlation for the group of 
persons under consideration. It is quite possible that items may show 
a high positive correlation within a group, but an individual may never- 
theless differ widely from the group tendency. For example, studies 

56 


A Fallacy in Use of Median Scale Values 57 


show a relatively high positive correlation between speed and accuracy 
of work. It is not uncommon, however, for an industrial supervisor or 
executive to challenge this finding on the basis that some of his workers 
show such great differences in speed and accuracy that the overall finding 
is untenable to him. 

Table 1 


Comparison of Two Types of Scale Values 
with Reference to Three Employees. 


= 
Median Revised Employee 

Scale SN m 
Item Value Value A B Cc 
Ts one of the best employees in the de- 8.6 3.6 x x x 

partment 
Has unusually good quality 8.4 3.4 x x x 
Carries through on all jobs 8.2 3.2 p.a Dg bg 
Is extremely loyal 8.0 3.0 x x x 
Gives close attention to instructions of 78 2.8 x x x 
supervisor 

Plans work well 7.6 2.6 x x x 
Has Good judgment | 7.4 24 x x x 
Learns new work easily 7.2 2.2 x x 
Ts enthusiastic 70. 2.0 x x 
Reacts favorably to corrections 6.8 1.8 X x 
Starts work earlier than others 6.6 1.6 x x 
Ts a steady worker 6.4 1.4 x 
Gets help when in difficulty 6.2 1.2 x 
Profits from past mistakes 6.0 1.0 x 
Ts pleasant and courteous 5.8 8 x 
Does fair share of work 5.6. 6 x 
Does not alibi when corrected 5.4 4 X 

Total Score based on: Median Scale Value 80 76 7.0 

Revised Scale Value 21.0 28.6 34.0 


For purpose of illustration, Table 1 gives seventeen items in order of 
their scale value as determined by one hundred supervisors. The items 
form the positive or favorable half of a scaled check list. They are all 
Satisfactory for use in a check list so far as tests of relevancy and ambi- 
guity are concerned. ; 

Ratings of three hypothetical employees are given in columns headed 
A, B, and C. (It is assumed that none of these three persons has been 
checked on any items falling below the median scale value of 5.0.) The 
Median scale values for the three employees are 8.0, 7.6, and 7.0 re- 
Spectively. It will be noted that A is the “best” employee because he 
does not learn new work easily, is not enthusiastic, does not react favor- 
ably to corrections, etcetera! Employee C is the worst of the three 
employees because he possesses all the listed virtues and performs all the 
favorable actions! 


58 "Clifford E. Jurgensen 


From a theoretical position it can be contended that the above findings 
will not commonly be found in actual cases if items are properly selected. 
However, the presence of the error of median scale score was originally 
found by the author when it was noticed that the “better” of two em- 
ployees obtained the lower of the two scores on a scale developed by the 
usual approved techniques. Other such cases were subsequently found. 
The decreased validity of the seale (whether large or small) is only one of 
the objections to the method. An even more serious objection is that 
the entire scale might fall into disrepute and discard if a few of the raters 
were to discover that overall scores would increase in magnitude if some 
of the favorable (but low value) items were not checked even if applicable 
to the employee being rated. 

A simple solution to the above fallacy is to replace median values by 
positive and negative values obtained by subtracting five from each of 
the item medians. (This assumes that scaling was based on nine equal- 
appearing intervals. The constant would differ for other numbers of 
groups.) The merit rating “score” for each employee is the algebraic sum 
of the revised weights for items checked as applying to the employee. 
For the three hypothetical employees referred to in Table 1, the revised 
scale scores would be 21.0, 30.0, and 34.0. It will be noted that the order 
of merit is the reverse of that obtained from the median scale value 
method, and that the revised order is consistent with logic. 

Previous discussion has been limited to median scale values. Exactly 
the same situation, however, is true for mean scale values. 

The above is proposed as a simple solution to the error of median 
scores. The scoring of scaled check lists on the basis of algebraically 
summed deviations is just as easy as use of mean scale values. Even 
though the validity of the scale may not be increased greatly (for the 
group as a whole) by this change in scoring procedure, scores of specific 
individuals sometimes change appreciably. The use of the inaccurate 
median (or mean) scale value does not appear defensible on logical grounds 
even though it seldom results in significant error. 

Received July 28, 1948. 
References 
iý dipek Eaa Sr ri oie method of appraisal for assistant managers. 
2. Knauft, Edwin B. Construction and use of weighted check-list rating scales for two 
industrial situations. J. appl. Psychol., 1948, 32, 63-70. 


"8. Jurgensen, Clifford E. A nomograph for rapid ta er 
< chometrika, 1943, 8, 265-260, apid determination of medians. Psy 


4. Richardson, M. W., and Kuder, G.F. Making arati i 
J., 1933, 12, 36-40. ga rating scale that measures. Person 
5. Seashore, R. H., and Hevner, K. A time-savi i i f 
atada aAA E JNE , , 1983, 4, ving gas for the construction © 


coo ht O a 


Ten 


An Empirical Approach to a Problem 
of Psychophysical Scaling * 


William H. Angoff 


Human Engineering Branch, Special Devices Center, 
Port Washington, New York 


Since Thurstone’s original work in the scaling of crimes by means of a 
paired-comparison procedure (2), numerous psychological judgments, 
including those concerning attitudes, have been ordered to continua in 
similar fashion with success. Specifically in industry, a scale has been 
developed for the quality of work performed by industrial supervisors 
(3). Another application in industry of the paired-comparison technique 
“may conceivably be that of scaling jobs within an industrial plant. In 
this latter instance, job levels may be determined by the combination of 
scale values for factors attaching to a particular job, or they may be de- 
termined by the simple scaling of the jobs as a whole without regard to 

separate factors. In either instance, the jobs could be ordered to a single 
continuum which would then define the hierarchy. The theoretical 
defensibility and practical simplicity of such a job evaluation approach 
Appears to constitute unquestionably an advantage over the procedures 
currently in use. However, there would appear to be a practical difficulty 
in the situation involved in job evaluation. Wherever job hierarchies 
are determined, it is frequently the case that new jobs are added or old 
jobs changed as the plant continues to function. If an entirely new scale 

of n items or jobs is developed each time the original items or jobs are 
altered or increased in number, the procedures of scaling, particularly 
where large numbers of judgments are involved, can become a very costly 

and time-consuming affair. It is therefore suggested that new items, 

" whether they are jobs or other judgment-objects, may be inserted and 
placed, as they appear, in their proper positions in a scale that has already 
been set up and found to be satisfactory; and that a new rescaling of all 
items would not then be necessary. 

The present study attempts to duplicate in miniature such a situation 
as might obtain in an industrial plant, where a new item is added to a 
Scale which has already been determined, and is presumably in use. 

* The author would like to express his gratitude to Dr. C. H. Lawshe and Dr. N. C. 


Kephart of Purdue University where the work was done for their advice and assistance 
In the preparation of the manuscript. 


59 


60 William H. Angoff 


Procedure 


A group of ten male movie actors were chosen who are well known 
to the public, and were used as object-choices. The subjects making the 
comparisons were twelve in number, and relatively advanced in terms of 
level of education, intelligence, and sophistication with regard to tastes 
in moving pictures. The ages of the subjects ranged from 26 to 36 years. 

Forty-five cards were prepared with every one of the ten names of 
the object-choices paired with every other of the remaining names. No 
pair occurred more than once in the deck of 45 cards. The stimulus- 
statement was prepared in advance and read by each of the twelve 
subjects prior to making his choices. The statement read as follows: “In 
the following pairs of movie actors, choose the one you would prefer 
to see in a moving picture. Use whatever basis you please for your 
decision.” The choices for the 45 pairs were made separately and inde- 
pendently by each subject and recorded by the experimenter on the spot. 

Tt may be noted that no attempt was made in the experiment to assure 
uni-dimensionality or high reliability in the eventual scale. The movie 
actors chosen for the study are all current popular favorites, and it was 
expected that there would be much disagreement among the subjects 
with regard to their preference-choices—as indeed there was. Thus as 
much opportunity as possible was provided to permit the scale values to 
be affected by the withdrawal or insertion of items. Also, as was ex- 
pected, the range of scale values that resulted was narrow, permitting 
slight shifts in scale values to exert considerable effects upon the correla- 
tion coefficients that were to be computed. 

With regard to the question of uni-dimensionality, it was felt that 
while the concept is a highly important one in the usual scaling problem, 
it was a consideration not relevant to the problem here. The purpose of 
the present study was to manipulate preference-judgments as they were 
turned in by the judges. The particular manner in which the scale was 
constructed, and the assumptions underlying the construction of scales 
were felt to be matters for separate consideration. 

The choices having been made by all the subjects, a table of paired 
frequencies was drawn up, and a standard-score scale-value was deter- 
mined for each movie actor directly from the percentages. The per- 
centages represented the ratio of his “preferred”? frequencies to the total 
possible “preferred” frequencies. Constant values were added to each 
scale value to convert them to positive numbers, and finally a 10-item 
ao was constructed which then constituted the basic or “criterion” 
scale. 

‘ The specific problem now involved deriving a nine-item scale con- 
sisting of all but one of the items. This nine-item scale would correspond 


Empirical Approach to Psychophysical Scaling 61 


to the scale referred to above that “has been determined and is presum- 
ably in use.” The tenth item, not included in the scale, would correspond 
to the new item which must be inserted into the pre-existing scale. The 
question arises: Can we have our judges make paired comparisons only 
between the new item and the nine old items in order to secure a scale 
value for this new item; or is it necessary to rescale all ten again? That is, 
will the information from n — 1—in this case, nine—paired-comparisons 
give as good a scale as the information derived from n(n — 1)/2—here, 
45—comparisons? 


Table 1 


Proportionate Frequencies of Preference of Row 
Object to Column Object 


Actor 
A B Cc D E F G H I J Scale Values 


A 500 417 .250 .250 .333 417 .333 .333 .250 .250 190 
B 583 .500 .500 .500 .583 417 .500 .750 .417 .583 705 
C .750 .500 .500 417 .667 .667 .833 .833 .500 .667 962 
D .750 500 .583 .500 .583 .667 .583 .917 .583 .750 986 
E 667 417 .333 417 .500 .583 .417 .750 .500 .667 685 
F 588 .583 .333 .333 .417 .500 .333 .667 .333 :667 559 
G 667 .500 .167 .417 .583 .667 .500 .917 .417 .833 791 
H 667 .250 .167 .083 .250 .333 .083 .500 .167 .167 000 
I .750 .583 .500 .417 .500 .667 .583 .833 .500 .667 875 
J 750 .417 .333 .250 .333 .333 .167 .833 .333 .500 433 


By erasing one at a time the columns and corresponding rows in 
Table 1, ten new scales of nine items each were developed, each time, of 
course, omitting one of the actors. The scale-values in each of these new 
scales were then different from the scale-values in the criterion scale, since 
they had been constructed without consideration of the cells correspond- 
ing to the actor omitted in each instance. At this time the scale-value 
for the omitted actor in each scale was determined independently on 
the basis of the number of times he was preferred to the other nine. It 
is apparent that now his percentage value, and consequently his scale- 
value, was the same as in the criterion scale, since the same number and 
kind of comparisons were computed for him here as had been computed 
for the criterion scale. But since his relative scale status was changed 
because of the changes in the scale-values of the other nine, his relative 
Scale-value was accordingly changed. 

The foregoing procedure of drawing out one object-choice at a time 
and re-inserting into the scale of nine was then modified to answer the 


62 William H. Angof 


following question: If single-item insertion in a scale of nine results ina | 
ten-item scale that is not substantially different from the scale that 
would have resulted had all ten items been originally considered at one 
time (i.e. the criterion scale), then how many items is it possible to insert 
in a scale before the new scale shows an appreciable departure from the 
criterion scale? To this end, six of the ten actors were chosen randomly 
and withdrawn in combination from the scale—first actor X, then actors 
X and Y together, then X, Y, and Z together, and so on. After each 
withdrawal, a scale was constructed of the remaining actors, and the 
withdrawn actors were then reinserted into the scale. 

There were two ways in which these insertions could be made. When 
„r actors were withdrawn from the original set of n actors, a scale of n — r 
actors was constructed. In order to derive scale-values for the r actors, 
they could (a) be paired with each of the n — r actors—thus making 
r(n — r)! comparisons, or (b) the r actors could be paired with one another 
as well as with the n — r actors, thus making r(r — 1)/2 + r(n — r) com- 
parisons. Both of these procedures were carried out. 

To summarize, then, one criterion scale and three sets of so-called 
“derived” scales were developed: 


1. Criterion scale—all ten items used—n(n — 1)/2 = forty-five com- 
parisons. 

2. Single-item insertion into a previously established scale of nine 
items—n — 1 = nine comparisons. (Ten such scales.) 

3. Multiple-item-insertion into a previously established scale, making 
only r(n — r) new comparisons in each case. (Six such scales.) 

4. Multiple-item insertion into a previously established scale of n — 7 
actors, making r(r — 1)/2 + r(n — r) new comparisons in each case. 
(Six such scales.) 


Results 


The following tables and figures are presented for reference: 

Table 1 is the original matrix of comparison-judgments showing the 
percentage of times the row-object-choice is preferred to the column- 
object-choice. Table 1 also gives the criterion scale, all values con- 
sidered positive, which was derived from the matrix, 

l Table 2 presents the ten scales derived from the method of single-item 

insertion, all values positive. The correlations between each of the scales 

and the criterion. scale appear at the foot of each scale. E 
1 While the main diagonal of the Pereentage-preference matrix, corresponding tO 


self-comparison of each item, was actualh i i l \ 
it is not included in the discussio: issi n pia: 00 ho cette e o eae soan i 


Empirical Approach to Psychophysical Scaling 63 


Table 2 
Single-Item Inserted Scales 
Note: The column headings refer to the actor who was withdrawn and re-inserted 
into the scale of the remaining nine. 


Actor Criterion Scale A B CO) Deh eies mx Gay I J 


A 190 334 160 185 158 184 190 130 190 185 185 
BiB 705 836° 699 682 655 686 762 655 645 705 659 
Cc 962 1072 966 929 966 948 978 844 904 969 921 
D 986 1098 1020 969 926 996 1002 942 904 969 921 
E 685 789 709 705 655 679 694 655 622 659 612 
r 559 673 523 566 539 568 583 539 506 566 473 
G 791 907 803 871 772 780 787 731 692 799 682 
H 000 000 000 000 000 000 000 000 000 000 000 
St 875 976 874 871 868 898 881 820 809 842 824 
J ‘433 484 429 425 421 452 482 446 316 425 400 

r 995: .998 .995 .998 .999 .998 .995 .995 .999 .996 


Tables 3 and 4 similarly present the scales derived from the method of 
" multiple-item insertion. Table 3 gives the scales for the`r(n — r) com- 
" parisons, and Table 4 gives the scales for the n(n — 1)/2 + r(n — r) 
comparisons. Correlations between each of these scales and the criterion 
$ scale similarly appear at the foot of each scale. 
i Figures 1, 2, and 3 are presented to illustrate graphically the results 
$, 


Í 


= shown in Tables 2,3,and 4. Figure 1 is a graphical presentation of the 
appearance of each item on the scales of preference of actors. The 
; Table 3 
Multiple-Item Inserted Scales 
(Scale Value for Each Inserted Item Determined 
on the Basis of r(n — r) Comparisons) 
Note: The column headings refer to the actors withdrawn and re-inserted. 


"Actor Criterion Seale F. F.G. FGH. F.G.H.B. FGHBD. F.GHB.DJ. 
A 190 190 121 118 058 000 000 
B 705 762 711 640 625 546 475 

‘ec’: 962 978 844 759 775 774 696 

J) 986 1002 954 852 884 795 685 
E 685 694 658 580 600 559 432 
F 559 583 557 502 444 406 263 
G 791 787 721 605 595 546 349 
H 000 000 000 000 000 008 047 
I 875 881 818 731 706 686 588 
J 433 482 502 370 350 306 


355 
r 988 .988  .992 -988 984 943 


64 William H. Angoff 
Table 4 


x 
Multiple-Item Inserted Scales 
r(r—1) 


(Scale Value for Each Inserted Item Determined 
on the Basis of r(n — r) + ar Comparisons) 


Note: The column headings refer to the actors withdrawn and re-inserted. 


Actor Criterion Seale F. F.G. F.G.H. F.GHB, F.GHB.D. F.G.H.B.D.J. 


A 190 190 121 160 115 146 190 
B 705 762 711 682 705 705 705 
c 962 978 844 801 832 920 886 
D 986 1002 954 894 941 986 986 
E 685 694 658 622 657 705 622 
F 559 583 517 559 559 559 559 
G 791 787 749 791 791 791 791 
H 000 000 000 000 000 000 000 È 
I 875 881 818 773 763 832 778 
J 433 482 502 412 412 496 433 
r .998 .989 .990 990 -995 .994 
CRITERION d 
SCALE DERIVED SCALES E 
9 AD Caa eye 
C i 
; 
1000 |, : f 
C D cp 
900 ? co G0 y 
Gt 
800 3 i i $ 
6 E 1 
700 fg £ ME T 4 
BE [a E B 
600 E 5 
F | 
500 A : 
J F 
400 v 3 
J 
300 J 
200 i 
A 4 
a i 


H H H H 
IEINSERTED INTO SCALE 


Fig. 1. Single-item inserted scales (see Table 2). 


H 
FSCALE HEADIN ER 
OF REMMI RE ETER TO THE ACTOR WITHORAWN AND RI 


Empirical Approach to Psychophysical Scaling 65 
CRITERION 
SCALE DERIVED SCALES 
0 F FG FGH FGHB FGHBD FRGHBDJ* 
1000 |, 9 
C fo 
900 | 
' 1 D 
| i i 
800 ê i 
3 je g c 
is Fi 
700 fg É 8 ; ¢ 
3 B 
8 
600 $ [z E i 
F F E 
| e 
500 Ly J IF ji 
J E 
400 F 
FI 
J G 
300 X 
f 
200 I, A 
100 fi f 


H H H H 
*ScaLe HEADINGS REFER TO ACTORS WITHDRAWN AND REINSERTED, 


Fig. 2. Multiple-item inserted scales (see Table 3). 


criterion scale is presented here along with the separate scales derived 
from single-item insertion. Figure 2 gives the scales for multiple-item 
insertion where r(n — r) comparisons were made; and Figure 3 gives the 
multiple-item insertion scales when r(r — 1)/2 + r(n — r) comparisons 


_ Were made. 


As may readily be seen from the tables and figures above, there is little 
doubt that substantially nothing has been altered in the construction of 
the “derived” type of scale. The scale resulting from inserting items 
into a pre-established scale differs negligibly from a scale developed with 
the use of all possible paired comparisons. The correlations between the 
“derived” scales and the criterion scale are, in every instance, .94 or 
over, even when the number of items inserted into the pre-established 


66. William H. Angof 


CRTRRE" ‘ DERIVED SCALES 
o F FG FGH GHB FG,H,B,D FG,H,B,D,J* 
“1000 ? r o 
D D 
900 i 7 
4 c 
800 is $ y G 
B G ' 
700 E 2 8 BE B 
E E 
E 
600 E 
F F 
F 
500 $ J 
J 
400 4 
300 
200 X i 
100 A A 
* H H H H H H H 


SCALE HEADINGS REFER TO ACTORS WITHDRAWN AND REINSERTED. 
Fic. 3. Multiple-item inserted scales (see Table 4). 


scale exceeds the number already in the scale. In all but one instance 
the correlations are .98 or greater. 


Conclusions 


Generally speaking, it appears that the smaller the number of items 
inserted, the higher will be the validity? of the “derived” scale. It i$ 
felt that the validity is roughly inversely proportional to the percentage 
of items inserted to the number in the pre-established scale. Particularly 
when the scale is not a reliable one—as is probably true in the present 
case— insertion of more than 50% will tend to lower the validity of the 
seale beyond desirable limits. In the opinion of the author, the ratio, 


2In the usual sense of the term, “validity” does not strictly apply here. . What i8 
meant by “validity” is the correlation of a “derived” scale with the criterion scale. 


— a 


4- 


SER 


Empirical Approach to Psychophysical Scaling 67 


rm r, should be no greater than .50. The implication for job evalua- 


tion is that when fifty per cent of the present jobs are altered, or a cor- 
respondingly similar proportion of new jobs is added, a new scale of n 
items should be drawn up. Even here, it should not be necessary to 
make n(n — 1)/2 new comparisons. It would be sufficient to retain the 
(n — r)(n — r — 1)/2 old comparisons, and to add to that r(r — 1)/2 
+ rín — r) new comparisons in order to build a new matrix and scale 
from the total of n(n — 1)/2 comparisons. 

In general, the greater the number of judgments possible for the r 
items, the higher will be the validity of the “derived” scale. That is, 
when the r new items are compared one with another as well as with the 


m — r old items, higher correlations result between the “derived” scale 


and the criterion scale. 

It appears from this study that much can be done in the way of modi- 
fying the construction and use of the paired-comparison scale without 
altering appreciably the units along the scale. It is felt that such sta- 
bility deserves further investigation of the paired-comparison technique. 


Unfortunately, as the number of object-choices increases, the number 


of paired judgments increases so rapidly that the scale falls down ‘under 


its own. weight. Additional work is needed, then, to test further the 


modifiability of the technique in order to permit a wider range of applica- 
tion. i 

The applications of this kind of modification in technique are fairly 
numerous. In the construction of attitude scales, for example, it has 
often been experienced by workers in the field that attitude statements, 
while meaningful during a particular period or for a particular group of 


‘Subjects, lose their applicability and meaning with the passage of time 


or with a change in the characteristics of the group measured. It is at 
‘that time necessary to delete items from the original scale, and some- 


times necessary to add new ones. It is apparent that any change in an 


‘ item of the scale will change the complexion of the rest of the items in 


the scale, The question to be answered, then, is whether or not the 


‘Seale resulting from the change in one or more items is sufficiently large 


yan: 


to warrant an entirely new scaling of all items. To a considerable ex- 


tent the present study answers such a question. If the empirical findings 


here continue to obtain, this type of manipulation with the items of a 


scale that has been derived by means of a paired-comparison technique 
an become quite extensive before the derived scale can be considered 
invalid. 

The implications of this technique of “derived” scales for industry 
are fundamental from a more general point of view. Fortunately or 
unfortunately, the industrial situation seldom meets the rigorous as- 


68 William H. Angoff 


sumptions involved in statistical techniques that are developed on the 
statistician’s desk or in the laboratory. Particularly in the industrial 
situation where personal satisfactions and benefits depend so heavily on 
the assignment of a rating or judgment, is it important that the judgments 
be made with greatest regard for precision and care. Ideally the pro- 
cedures adopted for use should conform with the procedures found to be 
most reliable in the laboratory. However, to the extent that practical 
considerations make impossible the use of orthodox scientific techniques, 
modifications must be introduced to conform with what is practicable. 
Still, from the point of view of scientific awareness alone, if nothing else, 
it is similarly necessary to know precisely what is the extent of reduction 
in the validity and reliability of a measuring instrument or technique as 
a result of modifying the orthodox procedures. It is only when he is 
equipped with such knowledge that the psychologist can deal with his 
data in industry with any real assurance. 


Received August 2, 1948. 
References 


1. Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 

2. Thurstone, L. L. The method of paired comparison for social values, J. abnorm. 
soc, Psychol., 1927, 21, 384-400. 

3. ae S., and Richardson, M. W. Item analysis. Person. J., 1933, 12, 


l 


l 
b 
Y 
4 
A 
\ 
2 


i ‘ed Comparison Technique for Rating Performance 
of Industrial Employees 


H. Lawshe, N. C. Kephart, and E. J. McCormick 


Occupational Research Center, Purdue University 


ce by the paired comparison athe but the mechanics of its 
en were specifically designed to simplify the various pro- 
es, 
The system lends itself to rating any aspect of employee performance, 
hough in most of its applications it has been used for rating over-all job 
formance. The cue for the use of this basic factor as a measure of job 
ormance is derived from such studies as that of Ewart, Seashore, and 
Tiffin (1) which brings out high degrees of communality among factors 
typically “measured” on rating scales. These authors identified the 
‘actor “Ability to do the present job” which accounted for most of the 
bility of the ratings. 
‘The Personnel Comparison System provides the rater with a booklet 


Each slip contains one pair of names. To facilitate Sey nu 
, eight slips are initially arranged on one 8} by 11 form. Pairs of 
es are typed on each slip and the slips are later separated by tearing 
ng perforated lines. 

The Personnel Comparison System for Rating Employee Performance, Copyright 
8 by C. H. Lawshe and N. C. Kephart, is available from Mayer and Company, 
Eighth St., Cincinnati 2, Ohio. 

69 


70 C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


Fic. 1. Method of marking pairs in the tating booklet made with the 
Personnel Comparison System materials. 


The procedures involved in the administration of the system and the 
subsequent scoring of results follow: 


1. The names of individual pairs are typed on the separate sections of 
the forms according to a pre-determined order which is presented in table 
form. The table provides for pairing each employee with each other 
employee. 

2. The sections are separated on the perforations and the slips are 
assembled into a booklet by means of a paper fastener inserted through 
prepared holes. 

3. The rater checks the preferred name on each slip. 

4. The number of times each individual is preferred is tallied on & 
summary sheet. 

5. A performance rating index is derived from a table,? the specifi 
index being determined by the number of times each individual was 
preferred and the number of individuals being rated. 


2 The indexes in the table are based on the proportion of times each individual is 
preferred, converted to standard Score units. These units are based on a mean of 50 
with a standard deviation of 10. Indexes range from approximately three standard 


deviations below the mean to approximately three standard deviations above the mea? 
(actually from 23 to 77). 


- Paired Comparison for Rating Performance 71 


Application of the System 


‘pose of experimentally applying the Personnel Comparison 
an operating situation, arrangements were made with a paper 
acturing company 1o Tate employee performance in ae 


ment of a ERANA for the validation of personnel tests, 
as a merit rating pr ocedure. The raters were asked to rate 


is performing his present job better?” The two departments — 
he system was tried, and the specific provisions for the applica- 
he system i in each, are given below. 


rset press department. Twenty-four of the offset pressmen who 
in point of serviee were rated by three supervisors and an 
All four raters had had an opportunity to become familiar 
work of each pressman through the systematic rotation of the 
rom one shift to another. One booklet including all pairs of 
was provided for each rater. Ratings were made inde- 


| 2. Stereo press department. Eight stereo pressmen on each of three 
vere rated. These 24 men had had five or more years of experience 
the While each man had previously been rotated between all 
MS sors, they were classified in terms of their present shift 
and the men on each shift were divided into random halves, 
1, 1-2, 2-1, 2-2, 3-1, and 3-2. 

e day, each of the three supervisors (designated A, B, ‘and C) 
ate Kioso men then on his shift. On the next day, each supervisor 
same men along with one-half of the men then on each of the 
hifts. The groups rated by the three supervisors on the first and . 
d days are indicated in Table 1. In addition, an instructor (Rater 
ied all of the 24 men. 

Table 1 

Subgroups Rated by Each of Three Supervisors on Two Days 


Supervisor Making Rating 


First Day Second Day > 
A A Cc 
A Nuit b ele 
B A BD 
B B C 
© B C 
Cc A Cc 


72 C.H. Lawshe, N. C. Kephart, and E. J. McCormick 


Results of Offset Pressmen Study 


The first study, involving the 24 offset pressmen, was conducted to 
determine the reliability of the ratings of different raters. 

The agreement between the four raters is shown in Table 2. This 
table shows the number and per cent of pairs in which the same individual 
was preferred by all four raters; the number of pairs in which three raters 
chose the same man; and the number of pairs in which the raters split 
two-to-two. Of the 276 different pairs rated, all four raters preferred 
the same individual in 227, or 82:3 per cent, of the pairs. In 36 pairs, 
or 11.1 per cent, three raters preferred the same individual. In the 13 
remaining pairs, or 4.7 per cent of the total, two of the raters preferred 
one of the individuals, and the other two raters preferred the other 
individual. 

Table 2 


Distribution of Preferences of Four Raters on Pairs of Twenty-four Offset 
Pressmen by Number and Per Cent of Pairs 


Distribution of 
Preferences of Four 
Raters on Pairs of No. of Per Cent 
Employees Pairs of Pairs 
4-0 -227 82.3 
3-1 36 13.0 
2-2 13 4.7 


276 100.0 


Intercorrelation of Ratings. Further analysis of the agreement among 
the four raters was accomplished by means of an average intercorrelation 
coefficient of the rank orders of the 24 men as resulting from the ratings 
of each of the four raters; the resulting average intercorrelation coefficient 
of the four rank orders was .94. 

Reliability of Ratings on Halves and Quarters. In order to examine 
possible differences in reliability that would result from the rating of 
smaller groups of the same employees by the four raters, average inter- 
correlations were also computed for chance halves and chance quarters of 
these 24 offset pressmen. The two chance halves included odd-numbered 
and even-numbered employees respectively, the numbers having bee? 
assigned by alphabetical order of names. The chance quarters, in turd, 
were made up of every fourth name in the list in the same fashion. Only 
the preferences on pairs of employees included in the particular chance 
half or chance quarter in question were considered. Within each such 
group the number of times each employee was preferred was tallied, and 


Paired Comparison for Rating Performance 73 


rank orders of the men in each group were subsequently determined. The 
average intercorrelations, computed by the rank-order method, are given 
in Table 3. These average intercorrelations closely approximate the 
coefficient of .94 obtained with the whole group. Even the correlation 
of .85 can reasonably be considered as satisfactory since only six men are 
involved. 

Ms. Table 3 

i _ Average Intercorrelations of Rank Order of Times Preferred of Chance Halves 


and Chance Quarters of Twenty-four Offset Pressmen 


; es 
DF F 


ay Average 
-e Group Intercorrelations 
| J Chance halves 
ee DANI 1st half 96 
2nd half 93 
; Average of 2 halves 94 
, Chance quarters 
| 1st quarter 97 
K 2nd quarter .85 
` 3rd quarter -93 
4th quarter 94 
Average of 4 quarters -92 


uid Reliability of Ratings on Restricted Range Group. A further analysis 

i of this same character was made with respect to a selected group of the 

a 24 pressmen representing a restricted range of talent. The overall group 
included three floormen (working supervisors), thirteen “A” pressmen, 
= Seven “B” pressmen, and one helper. The 13 “A” pressmen (who 
Operate somewhat more complex offset presses) were selected from the 
group for separate analysis, and the number of times each of these was 
Preferred over the others within this same group was tallied. The re- 
sulting average intercorrelation of the rank orders of this group was .79. 

3 ‘This reduction in average intercorrelation from that of the overall 
_Eroup and from those for the chance halves and chance quarters would 
expected since the group of “A” pressmen was much more restricted 

in its Tange of talent, and, generally speaking, tended to fall within the 
Central and above-average (though not extreme top) range of the distri- 
Dution of the entire group. The floormen consistently were rated above 

the “A” pressmen, and to a considerable extent the “B” pressmen and 

$ the helper tended to be rated toward the lower end of the over-all group. 
Na The Tatings of these 13 “A” pressmen were then subjected to a dif- 
vi ferent type of analysis. The relative rank orders of these 13 men were 
extracted” from the rank orders of the entire group; they were then 


74 C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


compared with the rank orders resulting from the preferences on only the 
pairs of men in this sub-group. The rho correlation between these two 
rank”orders was .996, indicating that there was practically no displace- 
ment in rank-order position among these 13 men when their rank order 
was derived from the ratings made exclusively on this group, as compared 
with their relative rank orders when “extracted” from that of the whole 
group. 
Results of Stereo Pressmen Study 

As indicated before, the eight stereo pressmen on each shift were split 
into chance halves. On one day each supervisor rated the eight men 
together, and on the subsequent day each supervisor rated the same 
eight men along with one of the halves of each of the other shifts. The 
instructor rated all 24 men on one occasion. 

Tn order to determine the correlations between subsequent ratings on 
men rated twice by the same supervisor, or on ratings by two or more 
raters on men rated in common, only the pairs of names pertinent to any 
such specific analysis were used in tallying the number of times each man 
was preferred, The rank-difference correlation coefficients (rho) between 
the several combinations of ratings are given in Table 4. 


Table 4 


Rank-Difference Correlations (Rho) of Various Ratings on 
Twenty-four Stereo Pressmen 


Groups ~ No. of Men Coefficient, of 


Rater Rated in Group Correlation (Rho) 
First and Second Ratings by Each of Three Supervisors 
A 1-1, 1-2 8 98 
B 2-1, 2-2 8 1.00 
c 3-1, 3-2 8 94 
Average 97 
Ratings by Two Different Supervisors 
A&B 1-2, 2-1 8 81 
A&C 1-1, 3-2 8 83 
B&C 2-2, 3-1 8 86 
Average 83 
Ratings by Each of Three Supervisors and One Instructor 
A&D 1H, 1-2 16 88 
2-1, 3-2 
B&D 1-2, 21 16 90 
2-2, 3-1 
C&D 1-1, 2-2 16 83 
3-1, 3-2 
Average 87 


See ae eee are \T ity 


_ Paired Comparison for Rating Performance $ 75 


bility of Two Ratings by Three Supervisors. The initial analysis 
gs of the stereo pressmen was that of the reliability of the 
made by each of the three supervisors of the eight men who 
en under their respective supervision. The rank-difference cor- 
(rho) between the two ratings made by each of the supervisors 
from .94 to 1.00, with an average of .97, which reflects a highly 
tory degree of consistency between the ratings. 

ility of Ratings Among Three Supervisors. As indicated above, 
were rated in common by supervisors A and B, eight others 
ed in common by supervisors A and C, and eight others were 
ommon by supervisors B and C. The rank-difference correla- 
een the two ratings of each of these three groups ranged from 
, With an average of .83. While these coefficients between 
de by different supervisors are somewhat below the coefficients 
‘two ratings made by the same supervisors on men whom they 
‘on successive days, they can nevertheless be considered as re- 
adequate degree of consistency among the three raters. 

ility of Ratings Between Three Individual Supervisors and One 
Each supervisor rated 16}men, while all 24 were rated by 
ctor. The*rank- difference “correlation “coefficients between the 


0 .90, with an average of .87. 


Table 5 
Average Intercorrelations of Ratings by Three Raters of Three 
Groups of Eight Stereo Pressmen 
ab Fe 
Groups Intercorrelation 
1-2, 2-1 84 
1-1, 3-2 76 
2-2, 3-1 87 
Average 82 i 


ability of Ratings of Three Raters. Since each of three groups of 
stereo pressmen was rated by two different supervisors and by the 
ctor, it was possible to determine the average intercorrelations of 

-orders resulting from the three ratings on each of these groups 
ht men. These average intercorrelations were .76, .84 and .87, the 

of the three being .82. (See Table 5.) This average is lower 
the average of the other measures of reliability previously men- 
x but i is within the same 1 relative range as those of the other meas- 


76 C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


‘Administration of Rating System 


Time Required for Administration. The time required for applying 
the rating system to the 24 offset pressmen may give a rough indication 
of the practical feasibility of the system in somewhat comparable cir- 
cumstances. It was estimated that it took a total of 12 hours to type 
the slips for the 276 pairs (including carbon copies for the four raters), to 
assemble the four booklets, to rate the workers, and to derive the rating 
indexes. This time did not include planning, conference, or administra- 
tive time, but did include the time required for the rating by all four 
raters. In view of the fact that time’ required for functions such as 
typing and separation of the slips does not increase proportionately with 
the number of different raters, the over-all time is not indicative of the 
time that would be required if the rating were done by one rater rather 
than by four. It is estimated that the time required to prepare material 
and to summarize results for a complete rating of the 24 men by one rater 
would be about five or six hours. 

The actual time required for each rater to rate the 276 pairs, however, 
was only about 30 minutes. This time required for actual rating is 
sufficiently reasonable to raise a question about the comments made by 
Guilford (2) and made in the report of the National Industrial Conference 
Board (3) to the effect that the method of paired comparisons is, by its 
nature, excessively wearying to the raters. More specifically, there is 
reason to doubt the limit of 15 subjects implied by Guilford as the upper 
limit of the practical application of the technique. Perhaps the me- 
chanics of the specific scheme provided for making the ratings have a 
significant bearing on the degree to which the system is acceptable to the 
raters, and consequently on the total number of subjects that can reason- 
ably be rated by one individual. 

In considering the over-all time required for all the processes there was 
no suggestion that this time was considered excessive by the company 
applying the system to these two groups of workers. 


Summary and Conclusions 


Two groups of 24 workers each were rated by the paired comparison 
technique using the Personnel Comparison System. One of the groups 
included 24 offset pressmen who were all rated by three supervisors an 
one instructor. The other group included 24 stereo pressmen, eight 
from each of three shifts; each supervisor rated the eight men on his 
own shift on one day, and on the next day he rated the same men along 
with one-half of the men on each of the other shifts, making a total of 
16 men. An instructor rated all 24 stereo pressmen once. 


Fe 


- Paired Comparison for Rating Performance Caine 


of the resulting ratings brought about the following primary 


'e was a high degree of reliability between the inog of two or 
who rated the same employees. 

iere was a high degree of reliability between successive ratings, 
on different days by each of three raters, on the employees whom 
ndividually supervised. 

Ehe, analysis of the ratings of a selected subgroup of employees 


fom the ratings on only the selected employees, as compared 
eir relative rank-order positions “extracted” from the ratings of 
er group of which they were a part. 

e evidence accumulated did not indicate that the time required 
was excessive. 


References 


rt, E., Seashore, S. E., and Tiffin, J. A factor analysis of an industrial merit 
‘ ing scale. J. appl. Bakoi. 1941, 25, 481-486. 

iuilford, J. P. Psychometric methods. New York: McGraw-Hill Book Company, 
Tne, 1936. 

ae rating; methods of appraising ability, efficiency, and aee ois National 


Flesch Count and Readership of Articles 
in a Midwestern Farm Paper * 


Howard B. Lyman 
East Texas State Teachers College, Commerce, Texas 


A preliminary study of the readers of Wallaces’ Farmer and Iowa 
Homestead suggested in March, 1946 that reducing the Flesch count of 
articles from 3.5 to 1.5 might substantially increase the number of sub- 
scribers reading that article. To investigate this clue, a similar survey 
was set up in November, 1946. 

The state of Iowa was divided into alternating counties, designated as 
“A” and “B” for the purposes of this report. The editor reveals that 
there may have been some sectional bias in the results, inasmuch as the 
“A” group of counties were a little heavier towards the southwest and 
the “B” counties heavier to the northwest. 

Papers for November 16, 1946 were run off with four articles printed 
in alternate forms (one with a Flesch count of approximately 3.5, the 
other with a count of approximately 1.51, two of the difficult and two 
of the easy forms appearing in each copy of the issue. Typography, 
illustrations, leads, subject matter, and position of the articles were 
identical; only the difficulty level was varied. The experimental copy 
was distributed to all subscribers in the “A” counties, the control copy 
to all subscribers in the “B” counties. An excerpt from both forms of 
Article 4 (Nylons) is given in Figure 1. 


Lower Flesch Count Version 
Edna, my neighbor, was lucky. 


Higher Flesch Count Version 
Nylon doesn’t always mean just & 


She has a big family. In 1940, she 
bought a ASIA green nylon and wool 
coat for Bonnie, her eldest daughter. 

Bonnie wore the coat for two 
years. Then, when she became a 
war bride, she got a new coat that 
would match her wedding suit. 


precious pair of sheer stockings any 
more. It can mean any number 0 
bright, new garments that are made of 
nylon. 7 

There are blouses, slips, children’s 
clothes, coats and such things as cur- 
tains, rugs, and upholstery materials. 


Fug. 1. Introductory paragraphs from Article 4 (Nylons). 


*From data collected and processed by the Farmer-Homestead Poll and made 
available to the writer by Donald B. Murphy, Editor of Wallaces’ Farmer and Iowa- 
Homestead, under whose direction the surveys were made. The writer of this article 
has merely prepared the data for publication in this journal, since he feels it suggests 2 
method of interest to psychologists. Murphy has previously reported the results in 80 
advertising trade journal (3, 4). 

1Flesch counts (1) computed by the Unit iculture 
cake ee by ited States Department of Agri 

78 


Flesch Count and Readership of Articles 79 


to thirteen days after publication, interviewers were sent 
instructions for obtaining a random sample of subscribers (“Go 
eross-roads north, turn to the right, call at every third farm”). 
mer was a subscriber, he was asked: “Did you HAPPEN to see 
anything on this page?” ‘Table 1 shows the final size of the 
eliminating non-subscribers. 


Table 1 
Size of Subscriber Samples in “A” and “B” Counties 


| “A” Counties “B” Counties Total 

i 73 76 149 

ji 75 83 158 
Total 148 159 307 


itative score for each article was obtained by the following 
n: a rating of 1 if the respondent indicated reading one-quarter or 
of the article; 2, if one-quarter to one-half; 3, if one-half to three- 
ers; and 4, if three-quarters to total. By using this system and cor- 
or difference in size of N and for variation in areas, it was possible 
ine the per cent of greater readership for the articles with the 
ch count. These facts are presented in Table 2. The female 
or articles 1 and 3 and the male scores for article 4 were thrown 
use of the small N’s. 
figures on significance are reported, several statisticians having 
t the data are not amenable to any of the standard tests; how- 
t deriving the corrected quantitative readership score, it was 
four articles showed a positive difference (i.e., an increase in 
for the lower Flesch count) and one a negative difference. The 
unt version of the editorial (Article 2) showed a 9.4% decrease in 
tip for the women. Increases in readership for the other low 
icles ranged from 7.3% to 66.0%. 
es’ Farmer reports that by deliberate attempts to keep copy 
le, they have been able to lower the range of most articles to 
5 and 4.0. Prior to this policy, articles had ranged from about 
Routine reader surveys have shown consistent increases in 
, and the lower Flesch counts are considered at least a con- 
factor to this increased popularity. 
opinion of the writer, even better results may be anticipated 
use of the new Flesch readability yardstick (2). The old 


| 


80 Howard B. Lyman 
Table 2 
Difference in Readership Scores for Articles When Copy is 
Simplified by Use of Flesch Principles 
Flesch No. of Raw Readership* 

Article Subject County Sex Count Readers Score 
1 Hogs A M 1.5 40 49.3 
B M 3.85 50 57.9 

A F 1.5 at “ag 

B F 3.85 = Cid 

2 Editorial B M 1.76 36 47.4 
A M 4.27 24 32.5 

B F 1.76 23 37.7 

A F 4.27 23 80.7 

3 Corn B M 1.35 51 65.8 
A M 3.47 37 47.3 

B F 1.35 bw = 

A F 3.47 se ye 

4 Nylons A M 1.11 Sragi ve 
B M 2.48 oe s 

A F 1.11 44 52.3 

B F 2.48 40 43.7 


* After correction of the scores to make them comparable for the two groups, one 
low Flesch count article (2B Males) showed a loss of 9.4% in readership. Two others 
(1 and 3, Females) were dropped because of the small N. The other four showed in- 
creases ranging from 7.3% to 66.0%. 

** Not computed, since N was less than 20. 


formula yielded an ambiguous index in which difficulty and interest were 
combined. The new formula is not only simpler to apply but it measures 
difficulty and interest separately. 

Received June 23, 1948. 


References 


1, Flesch, R. The art of plain talk. New York: Harper and Brothers Publishers, 1946. 

2. Flesch, R. A new readability yardstick. J. appl. Psychol., 1948, 32, 221-233. 

3. Murphy, D.R. Test proves short sentences and words get best readership. Printer’s 
Ink, 1947, 218, 61-64. 

4. Murphy, D. R. How plain talk increases readership 45: 6%. Printer’s Ink; 
1947, 220, 35-37. OSes ill 


——— 


eed of Reading Nine Point Type in Relation 
to Line Width and Leading * 


Miles A. Tinker and Donald G. Paterson 
The University of Minnesota 

e writers have previously reported the optimal limits for good read- 
‘Within which line width and leading may be varied for 6 point, 
10 point, 11 point, and 12 point type sizes.! The results appear 
Specific for each type size. Nine point type was found to be as 
as 10 point, 11 point, and 12 point type when each was printed 
WO point leading, and with its own optimal line width. 
was believed to be important to establish for 9 point type the same 
nation previously reported for each of the five type sizes mentioned 

The purpose of the present study, therefore, is to determine the 
of variation of line width and leading for 9 point type. The 
used was the same as in previously reported studies. 
A table giving detailed tabulated results for the twenty test groups 
00 sophomore laboratory students each is on file with the American 
cumentation Institute.? This table is not reproduced here because 
f its excessive size and detail which would be of primary interest only 
) the research scholar. 
able 1, however, presents in convenient summary form a guide to be 
by those who desire to specify the optimal limits of variation in 
Width and leading when nine point type is to be used. 
n setting up the study, we used, as a standard, material printed in 
ine widths with 2 point leading. The line width variations and 
leading variations shown in Table 1 were each compared in turn with 
ndard. The differences are shown as percentage increases or de- 
minus sign) in speed of reading. For example, the test material 
lin an 8 pica line width, set solid, was read 9.5 per cent more slowly 
an the standard, whereas the test material printed in the same short 
idth with 1 point leading was read 4.8 per cent more slowly than the 
Other entires in Table 1 are to be interpreted in a similar 


in 


4 


* Grateful acknowledgment is given to the Graduate School, University of Minnesota, 

ch grant to finance this study. 

terson, D. G., and Tinker, M. A. How to make type readable. New York: 

and Brothers, 1940. (Obtainable from the writers.) See Chapter 7, pp. 72-81. 

ppendix I, Methodology, pp. 161-189. 

table is available as ADI Documents in the form of microfilm (images one inch 

1 standard 35 mm. motion picture film, or photoprints (6 X 8 inches in size) 

PEEN > with unaided eyes. To secure this table order Document 2626 remitting $0.50 

we photocopy or microfilm from American Documentation Institute, Science Service 
» 1719 N Street, N.W., Washington 6, D. C. 

s 81 


82 Miles A. Tinker and Donald G. Paterson 


Table 1 
Simultaneous Variation of Line Width and Leading for Nine Point Type . 

Note: Reading speeds for 8, 14, 18, 30 and 40 pica line widths each set solid and 
leaded 1 point, 2 points and 4 points are compared (percentage differences) with reading 
speed for Scotch Roman printed in 18 pica line width leaded 2 points as a standard. 
Minus (—) differences indicate slower reading than the standard. Figures in bold face 
indicate extremely unsatisfactory typographical arrangements. Number of readers 
= 2000 university sophomores. J 


Line Set 1 Point 2 Point 4 Point 
Width Solid Leading Leading Leading 
8 —9.52 —4.75 —5.76 —6.78 
14 —4.39 0.68 0.46 1,30 
18 —2,72 0.23 0.00 3.24 
30 —5.17 —0.45 2.43 0.40 
40 —5.83 —3.97 —5.81 —2.57 


Examination of Table 1 shows that the region of optimal legibility 
ranges between a 14 pica line width with 1 point leading or more to a line 
width of about 30 picas with 1 point leading or more. 

All differences amounting to a 4 per cent or greater decrease in legi- 
bility are indicated by the use of bold face type. Such differences are 
significant beyond the 1 per cent level. This permits ready identification 
of typographical arrangements that should not be used. The 2.72 per 
cent decrease in reading rate for 18 pica line width set solid is significant 
at about the 3 per cent level. While this arrangement may be used 
without a large retardation in reading rate, it isnot recommended. The 
same is true of the 2.57 per cent decrease in reading rate for 40 pica line 
width with four point leading. 

As was true with the other type sizes studied and previously reported, 
one can specify line widths for 9 point type over a considerable range (in 
this instance, from 14 to 30 picas) provided one to four points of leading 
areused. Conservative practice would probably specify one or two point 
leading for 9 point type in line widths varying from 16 picas to 24 picas. 
Our studies of reader preferences show that readers dislike long lines and 
very short lines. 

Summary 


The present study was carried out to determine the influence of line 
width and leading on the speed of reading 9 point type. 

The results indicate that optimal rate of reading occurs with line 
widths of 14 to 30 picas and with 1 to 4 points leading. This may be 
considered the zone of safety. "“228B iyen 

A conservative range would be 167to 24 pica line width with 1 or 2 
points leading when 9 point type is used. 

Received June 10, 1948. 


Effect of Target Brightness on “Normal” and 
“Subnormal” Visual Acuity * 


James E. Kuntz ** and Robert B. Sleight + 
Division of Education and Applied Psychology, Purdue University 


and Rand (1) and Ferree, Rand, and Lewis (2) have described 
nce of illumination on the visual acuity of a few persons of 
ivergent ages and with greatly varied visual abilities. They 
ed (2): “Lighting practice has been conventionalized much too 
y y with respect to intensity of light.” Tinker (3) believes that 
udies referred to above “. . . suggest a moderate increase in 
ion for those with corrected vision as compared with normal 


purpose of this experiment was to investigate this problem further 
paring the performance of a group of people with “subnormal’’ 
ity with a group having “normal” visual acuity on a task of 
discrimination under varying brightness levels. 

s experiment those persons were considered subnormal who dem- 
d a visual acuity below 1.0 and those considered normal who 
sual acuity above 1.0 in decimal notation when measurements were 
a distance of 28 inches and with a brightness level of ten foot- 
on the same test as used in the experiment reported on in this 
There were 12 Ss in the subnormal group and 12 in the normal 


ightness levels were used, viz., 3.16, 10, 31.6, 100, 316, and 1000 
erts. These conform in log terms to 10°, 10!, 1015, 10°, 107, 


Tesearch was supported by a subcontract between the Purdue Research 
1 and The Johns Hopkins University. The subcontract was part of Con- 
166, Task Order I, Project Designation Number NR-784-001, between 
ces Center, Office of Naval Research, and The Johns Hopkins University. 
is Report No. 166-I-67 under that contract. The authors wish to express 

for the advice given by Drs. N. C. Kephart, L. M. Baker, and J. A. Bromer 
ing of this experiment and the preparation of this report. 

address: Division of Education and Applied Psychology, Purdue Uni- 


ative Research, Baltimore 2, Maryland. 
: 83 


84 James E. Kuntz and Robert B. Sleight 


Apparatus and Acuity Targets 


Figure 1 is a schematic diagram of the apparatus used in this experiment. 
Illumination was provided by electric lamps (L) varying in wattage from 15 
to 500 mounted on a frame 52 inches square. The lamps were arranged so 
that glare in the visual field was almost entirely eliminated. In the center of 
the frame was an opening 6 inches square into which was fitted an eye shield 
(ES). The subject sat on seat (S) and read the target (T) at a distance of 
28 inches through this opening. A light shield (LS) was provided to eliminate 
direct light from the lamps striking the S’s eye. A hood (H) was also used to 
shield the subject from light reflected into the surrounding room. 


S$ 


Fra. 1. Schematic diagram of apparatus used for the stud i i = 
5 ly of visual acuity (T 
Acuity target, B = Background surface, LS = Light shield, ES = Eye shield, H = Hood, 


B = Seat, L = Lamps, SW = Switch = r 
Hibben), , es, C = Constant voltage transformers, R 


The background (B) consisted of fine-grai i 
` a grain wood covered with several coats 
or annhe paint and extended to the limits of vision. The checker- 
db da (T) was mounted on a piece of cardboard 6 inches square. The 
Sana was of approximately the same texture and albedo as the back- 

Constant voltage transformers (C) were used i A 

i etre in the lines. The MacBeth 
Lapeer was used daily, with measurements made at the position of the 
he need E insure correct brightness levels. Brightness levels were readily 
e 3 aN of Sicha (SW) which controlled the lamps of various 
brightness, (R) were used to make very minor adjustments in 
e acuity targets were of the type used in the Bausch and Lomb Ortho- 
eed reagent far and near acuity. Each target consisted of a checker- 
Bere 6G criminated from gray areas in order to locate it in one of four 


| 


Effect of Target Brightness 85 


possible positions. The task consisted of locating the position of the checker- 

oard in a series of such targets progressively diminishing in size. The actual 
size of detail to be discriminated was as follows: 0.0135, 0.0102, 0.0080, 0.0067, 
0.0058, and 0.0050 inches. (These sizes in terms of visual angle are 1.66, 1.25, 
0,98, 0.82, 0.71, and 0,61 respectively.) The decimal acuity notations were 
determined by calculating the reciprocal of the visual angle subtended by the 
task object in each acuity target. The decimal notations which corresponded 
to each of the above targets were: .6, .8, 1.0, 1.2, 1.4, and 1.6 respectively. 


ay Procedure 


Targets were presented in a sequence designed to help eliminate the influ- 
ence of different degrees of motivation, and to cancel the effects of learnin; 
and fatigue. The starting points (levels of brightness) for the Ss were rotate 
50 that in each group two people began the experiment at each level and 
aa “up” the brightness scale until the highest brightness was reached. 
4 who did not begin at the lowest level then went to the lowest level and 
continued “up” to the point of beginning. There were 10 randomized presen- 
tations with regard to position for each target. If the S made five or more 
correct responses out of the ten target presentations, E then proceeded to 
administer the next smaller target until a target size was reached on which S 
made less than five correct responses in ten trials. All Ss were required to 
make a response for each target presentation. The succeeding target presen- 
tations were made in approximately 3 seconds after each response with the 
brightness constant at the level being used at the time. There was-no time 
limit for making responses. Two minutes were allowed for adaptation in 
going “up” the Meeke scale and 5 minutes when going from the highest 


_ to the lowest level. 


The Ss used in this experiment were 7 female and 17 male students at 
Purdue University. The age range was from twenty to thirty-five years. One 
‘normal’ § was tested with glasses, the remaining Ss were tested without 
. All Ss were tested monocularly, the unused eye being occluded by a 
card in the eye-shield. 7 f 

Each S was instructed as follows: “This is a vision test. It is a rather 
complete test and will require approximately 40 minutes. You are to locate 
the pence of the checkerboard in the target which is the same as the target 
sed in the Ortho-Rater. (All Ss had been ‘Ortho-rated’ previously.) You 
ee ron with top, bottom, right, or left for each target presented. Hach 
target will be presented 10 times. In order to determine how well you see, 
Very small targets will have to be presented, so make the best response you 
can for each presentation even though you are not sure of the correctness of 
oa Relax and rest your eye during the changes of brightness 
i + You will be given a few minutes to get used to the next level of bright- 

Ress, During that period keep your eye fixed on the target background.” 


Ne Results 


a A decimal acuity score was calculated for each S by correcting the 
taw scores for chance. The following formula was used in obtaining 
this correction: Se = 3(4C — 10) where S. is the corrected score and C 
is the number of corrected responses. 

: Taw scores for each subject for a particular level of brightness 
Were obtained by counting the number of correct responses on the target 


86 James E. Kuntz and Robert B. Sleight 


immediately following the smallest target on which at least 5 correct 
responses were made, then correcting this score by using the above for- 
mula. The decimal acuity score was obtained by interpolation. For 
example, § made a raw score of 4 on the fourth target in the series. The 
preceding target, no. 3, was the last target on which at least 5 correct re- 
sponses were made, Target no. 3 subtends a visual angle of 1.0 giving a 
decimal acuity notation of 1.0 also. Interpolating for the interval of .20 
(the difference in decimal acuity notation for targets no. 3 and no, 
4) gave a decimal acuity notation of 1.040 for S for any one level of 
brightness. 

The analysis of variance of the decimal acuities for the subnormal 
group is given in Table 1 and a similar analysis for the normal group is 
given in Table 2. The two analyses give essentially the same results 


Table 1 
Analysis of Variance of Decimal Acuities of the Subnormal Group 


Sum of Estimate of 


Source of Variation Squares df Variance F 
Between Brightness Levels 1.65 5 329 32.90" 
Between Subjects 117 1 ~ 107 10.70* 
Interaction 56 55 .010 
Total 3.38 71 


* Significant at the 1% level of confidence. 


Table 2 
Analysis of Variance of Decimal Acuities of the Normal Group 
fi f i 
Source of Variation Spn df EPER PA F 
Between Brightness Levels 33 5 066 4.71* 
Pepee Subjects 18 1 016 114 
man tk 78 55 .014 
‘otal 1,29 71 


* Significant-at the 1% level of confidence. 


with the exception that the normal group could be regarded as more 
homogeneous than the subnormal group. The P values found show 
that levels of brightness play an important role in determining acuity 
scores for both groups. 

_Figure 2 shows graphically the relation between acuity and levels of 
brightness for the two groups. The mean decimal acuity for the sub- 
normal group was .668 as compared to 1.057 for the normal group at the 


ee — rrr 


Effect of Target Brightness 87 


of brightness, viz., 3.16 footlamberts.! This level resulted 
acuity for both groups. Maximum acuity was reached by 
s at 1000 footlamberts, the mean decimal acuity being 1.061 
ibnormal group and 1.264 for the normal group, The subnormal 
3a mean decimal acuity gain of .393 as compared to .203 for the 
group. The significance of the difference of mean gains resulted, 
” value of 2.54. (A “t” value of 2.819 is required for 1% level of 
e, 2.074 for 5% level of confidence.) 

‘significance of the difference of the slopes of straight lines fitted 
‘means of each group by the method of least squares resulted in 


NORMAL 
== — = SUBNORMAL 


he, BRIGHTNESS LEVELS (FOOTLAMBERTS) 
Variation of mean visual acuity for “Normal” and “‘Subnormal” Groups with 
level of target brightness. N for Normals = 12, N for Subnormals = 12, 


teader may be more familiar with light measurement in terms of footcandles. 
dle is a photometric measure which specifies the quantity of light falling 
‘surface. A one-candle source delivers one footcandle of illumination on a 
n the surface is at a distance of one foot. The footlambert is also a photo- 
e, but it quantifies the amount of light coming back from a reflecting 
perfectly reflecting surface which has one footcandle of illumination on it 
a brightness of one footlambert. In order to measure the brightness of a 
necessary to multiply the illumination on the surface (in footcandles) by 
reflectance of the surface. For example, if a surface reflects 80% of the 
falls on it then when one footcandle of illumination is put on the surface, 
a brightness of 0.8 footlamberts. Apparent footcandle, another frequently 
s term, is equivalent to footlambert. : 


88 James E. Kuntz and Robert B. Sleight 


a “t” value of 3.15 which is significant well beyond the 1% level of con- 
fidence. This technique, which takes into consideration all of the means 
of the two groups for each level of brightness, probably gives a more 
true indication of the effect of brightness increase than does a considera- 
tion of the two extreme means, viz., at 3.16 and 1000 footlamberts of 
brightness. 

By further analysis it was determined that the lower one-half of the 
subnormal group, i.e., the six Ss showing lowest acuity, made a gain of 
429 as compared to .393 for the entire subnormal group. This gives 
some indication that the poorer the initial visual acuity the more benefi- 
cial an increase in target brightness becomes. 

Also, as shown in Figure 2, if the curves were smoothed, it is of 
interest to note that the subnormal group reached “average” visual acuity 
(1.0 decimal notation) at a level of about 40 footlamberts with little 
change thereafter. Further, it will be noticed that the normal and sub- 
normal groups attained equal visual acuity at about 3.16 and 1000 foot- 
lamberts, respectively. 

The per cent of maximum acuity is shown in Table 3. The sub- 
normal group benefited most from increased brightness as shown by a 
gain of 37.1% as compared to 26.4% for the normal group. 


Table 3 


Per Cent of Maximum Acuity * at Each Brightness Level 
Soa 


Level Subnormal Group Normal Group 
3.16 62.9 83.6 
10 70.5 91.6 
31.6 91.5 92.9 
100 95.4 97.5 
316 97.5 97.1 
1000 100.0 100.0 
Total Gain 37.1 26.4 


* Maximum acuity is defined as mean acuity of each group at 1000 footlamberts. 


Tables 4 and 5 show the significance of the difference of means for 
the subnormal and normal groups between each level of brightness and 
every other level of brightness. 

j The only significant difference between means at successive levels of 
brightness is found between 10 and 31.6 footlamberts with the subnormal 
group, as shown in Table 4. When comparing the mean at 3.16 with the 
means at each other level of brightness all are significant beyond the 1% 
level except one, viz., between means at 3.16 and 10. It seems especially 


Effect of Target Brightness 89 


Table 4 


Critical Ratio of Differences Among Acuity Means at Each Level of 
wets: Brightness for the Subnormal Group 


3.16 10 31.6 100 316 1000 
1.95 7.40 * 8.40 * 8.94 * 9.59 * 
5.44 * 6.45 * 6,99 * 7.64 * 
1.01 1.55 2.19 f 

0.54 1.19 

0.65 


icant at the 1% level of confidence. 
nt at the 5% level of confidence. 


Table 5 


Critical Ratio of Differences Among Acuity Means at Each Level of 
Brightness for the Normal Group 


3.16 10 31.6 100 316 1000 
2.06 2.40 t 3.61 * 3.50 * 4.25 * 
0.34 1.55 1.44 2.20 t 

1.21 1.10 1.85 

—0.11 0.64 

0.76 


li nt at the 1% level of confidence. 
nificant at the 5% level of confidence. 


t to point out that there are no significant differences (at 1% 
nce level) between brightness intensity 31.6 and any higher 
1 within the range of brightnesses used. 

shown in Table 5 the critical ratios for the differences among means 
he normal group are in general considerably smaller than for the 
al group, although when comparing the mean at 3.16 with means 
ther levels of brightness all differences are significant beyond the 
el of confidence with one exception; the difference between the 
at 3.16 and 10 is significant just slightly below the 5% level. 
ight reversal can be noted, viz., between means at 100 and 316. 
his group the fact that no significant gain (at 1% confidence level) 
ormance is obtained when the brightness is raised above 10 foot- 
may be of considerable consequence. 

erage variability calculated by averaging the sigmas for each 


90° James E. Kuntz and Robert B. Sleight 


group at each level of brightness was .154 and .103 (decimal notation) for 
the subnormal and normal groups, respectively. The standard deviation 
for the subnormal group for the lowest level of brightness was .150 and 
for the highest level of brightness .151. For the normal group the 
standard deviation at the lowest level was .175 and for the highest 
level .088. 

Discussion 

The findings of this experiment give additional confirmatory evidence 
that brightness is one of the primary factors in vision. Several previous 
investigations have shown conclusively that a person’s visual acuity 
increases with target brightness. The present study, however, may be 
distinguished from most of the other investigations because it showed that 
the degree of gain in terms of ability to discriminate visually fine detail 
was relatively greater for those persons having initial below normal visual 
acuity, than for those having initial above normal acuity, when target 
brightness was increased. 

Tt is not felt that the findings of this experiment warrant as specific 
proposal concerning prescription of illumination intensities as was made 
by Ferree, Rand and Lewis (2): “In each case the individual needs should 
be determined and the intensity given that is required.” However, it 
does permit the more generalized statement that when visual acuity is 
the primary concern, light levels should be relatively “high” on jobs 
requiring the seeing of details where persons with reduced visual acuities 
are employed. In all likelihood little advantage would be gained for 
these people by prescribing more than approximately 31.6 footlamberts. 
For individuals with normal visual acuity it is probable that there would 
be only slight advantage in prescribing more than approximatley 10 
footlamberts. 

Tt should be obvious why the authors of this article hesitate to rec- 
ommend specific brightness levels for specific individuals. For one 
thing, the steps used were half-log steps and somewhat gross. Also, 
when individual cases are considered, certain complicating factors may 
be encountered even in evaluating a light level on the basis of threshold 
measurements, Not the least of these may be the motivational factors 
acting on the individual due to his relating a target brightness to the light 
under which he has been accustomed to working. 

It should be borne in mind that this experiment was of a visual 
threshold nature. Tinker (3) believes that: “One should not prescribe 
illumination for suprathreshold tasks in terms of threshold measure- 
ments.” The problem of optimum light intensities to minimize fatigue 
om DE olonged tasks has not been covered in this investigation. Maximum 

, acuity does not necessarily imply optimum working conditions because of 


LS ee oo 


Effect of Target Brightness 91- 


bles which may be included in the overall situation. How- 
il a satisfactory criterion for “visual fatigue” has been ascer- 
uld seem desirable to utilize the findings from threshold ex- 
in choosing desirable light levels for “fine” tasks. Naturally 
on of any feature of the working environment should take 
of the attitudinal viewpoint of the worker. 
findings of this experiment may -have extensive ramifications, 
in two interrelated endeavors, viz., (1) establishment of 
on standards, and (2) selection and placement policies in 
wherein an employee’s visual acuity may be a factor in satis- 


Summary and Conclusion 


experiment was performed to determine whether the amount of 

visual acuity, with increase of brightness on targets, differs 
for persons with initial “subnormal” acuity from those with 
rmal” acuity. The experiment was of a threshold nature with 
locating checkerboard targets under six levels of target brightness 
from 3.16 footlamberts to 1000 footlamberts. 


References 


E., and Rand, Gertrude. The effect of intensity of illumination on the 
point of vision and a comparison of the effect for presbyopic and non- 
byopic eyes. Trans. Illum. Engng. Soc., 1933, 28, 590-611. a È 

., Rand, Gertrude, and Lewis, E. F. The effect of increase of intensity 
on visual acuity of presbyopic and non-presbyopic eyes. Trans. Ilum. 
Soc., 1934, 29, 296-313. ; 

A. Illumination standards for effective and easy seeing. Psychol. Bull., 


Book Reviews 


Lawshe, Jr., C. H. Principles of personnel testing. New York: McGraw- 
Hill Book Co., 1948. Pp.-227. $3.50. 


This book is an elementary treatment of the problems involved in the 
testing of employees for purposes of selection. The expected phases are 
covered, namely test construction and validation (Chaps. II, III, IV and 
XIII), review of previous findings concerning the effectiveness of em- 
ployment tests (Chaps. V through XII), and establishment of testing 
programs (principally Chap. XIV). 

The approach taken by Lawshe will meet the approval of industrial 
psychologists. He is concerned with many of the problems in testing 
that can be appreciated only by one who has worked directly in industry. 
However, in the reviewer’s opinion the book gives only a general overview 
of the field. The coverage of important problems, methods, and findings 
is spotty, and any reader, virgin to testing, may obtain an incorrect, and 
certainly an incomplete, picture. 

The problems dealing with test construction and validation are ade- 
quate as far as they go—but they do not go far enough. The concept of 
reliability is not mentioned either in connection with criteria or with 
tests. There is no treatment of the combination or weighting of tests in 
a battery. Lawshe states that the purpose of the book is to serve as an 
aid to management. But the level at which the book is cast suggests 
that the author is underestimating the intellectual capacities of his 
potential readers. The point of view seems to be that concepts of second 
order difficulty, even though they be fundamental in nature, have no 
place in an introductory presentation. This simplified treatment is 
likely to convey to the wrong people the erroneous impression that any- 
one can develop and operate a testing program without getting into 
difficulties, 

‘The chapters concerned with reporting previous findings on the 
validity of tests will be disappointing to many. It is by no means an 
extensive review. Rather these chapters are intended to present illus- 
trative findings concerning the usefulness of different types of tests for 
YetIOUg kinds of jobs. Several comments seem pertinent in this con- 
nection. Not a single example of the work of the U. S. Employment 
Service on aptitude testing is cited. Yet surely Stead and Shartle’s now 
classic ‘Occupational Counseling Techniques” which summarizes 8° 
many of these excellent investigations deserves mention here. However, 


Book Reviews j 93 


ble value are the findings of a number of investigations not 
elsewhere. Of the validation studies cited, only a very few 
gative findings are given. Thus tests are hardly ever put in an 
orable light. It reminds one of the optimistic descriptions of 
given in the manuals of directions of so many published tests. 
mbers of management or unions who believe this to be the true 
are in for a sorry disappointment. Even a cursory review of 
hed reports will indicate that there is considerable variation in the 
ness of any given test applied to different groups of workers on 
me job. 
in fairness to the author it should be pointed out that he does not 
be to the rather extreme views as presented in his publisher’s 
g. Thus while the publisher claims that “From now on you can 
e right person in the right job every time!”, Lawshe emphasizes 
s are by no means a cure-all and can simply increase the proba- 
of selecting better employees. \ 
third topic, establishment of testing programs, is likely to be 
d as being another set of “practical” rules. This would be an 
While one might have hoped for a more expanded treatment, 
here deals with most important problems. Today any book in 
eld purporting to be a “Principles” of employee testing would be 
inadequate if it avoided such areas as supervisory support, 
labor unions, personnel records and management reports. The 
when the psychologist’s task begins with the writing of items and 
th the computation of the validity coefficient is past, if it ever 
Lawshe recognizes this and entertains for discussion certain of 
portant implications of testing in its larger setting of personnel 
s and labor relations. 
“sum, the reviewer's chief criticisms of this book are omissions of 
damental problems and concepts, and the elementary nature of the 
tion. Those who use the book as a text either in college courses 
ng or in similar courses for members of management or labor 
will undoubtedly find it necessary to provide supplementary 
and discussion. 
Edwin E. Ghiselli 
niversity of California 
Berkley, California 


' Stuart. Esperimental designs in sociological research. New 
Harper and Brothers, 1947. Pp. x+206. $3.00. 

breaking away of psychology from philosophy and the attempts 
ologists to convert their discipline into an exact science resulted 


94 Book Reviews 


in a professional. compartmentalization, the unfortunate effects of which 
have only recently become clear to many psychologists. In particular, 
we have been insufficiently aware of the attempt of sociologists to estab- 
lish their field as a science, for they broke away from philosophy after we 
did and our natural science orientation has generally kept us from ob- 
serving their efforts and progress. 

Several works have in recent years attempted to review and con- 
solidate the scientific gains made by sociologists: Lundberg’s Foundations 
of Sociology and Greenwood’s Experimental Sociology come to mind as 
illustrations. Chapin’s little volume is another distinctive contribution. 
As he puts it in his preface, his purpose is “to illustrate the method of 
experimental design by reproducing concrete studies,” to provide “a 
source book of examples of specific application (of the fundamental logic 
of experimental designs) analyzed in some detail” (ix). This is done by 
analyzing the methods used in nine experimental studies. Both the 
methods and the findings are of interest to applied psychologists. 

Chapin classifies the experimental designs used in sociological re- 
search under three headings: the cross-sectional study, the projected 
(before and after) design, and the ex-post-facto (retrospective) design. 
His interest in these types of experimentation arises from the fact that 
they can be used in real-life situations and are not limited to the labora- 
tory or classroom, He points out, for example, that social legislation 
(e.g., slum clearance and work relief) is social experimentation, and his 
interest is in experimental designs which can be used in the evaluation 
of the effects of such experimentation. 

The detailed analyses of experimental designs are stimulating in that 
they point up the possibilities of research in practical situations, and 
helpful in that they make clear the weaknesses and advantages of various 
procedures. Chapin brings out, for example, the inability of the social 
scientist to emulate the natural scientist in controlling all but one of the 
variables in an experiment. He analyzes the alternatives and shows how 
one of the best (randomization) is not generally feasible in real life (e.g. 
WPA workers were selected not only on the basis of need but also on the 
basis of employability, thereby making them differ from the direct-relief 
clients with whom they were to be compared in a study of the effects of 
work relief on morale). He then examines the use of available experi- 

_ mental groups and control groups as a solution. One such study (99-124) 
evaluating the effect of high-school graduation on economic adjustment 
(an ex-post-facto study) is analyzed to show the relative effects, on both 
numbers and definitiveness of results, of the precise matching of indi- 
viduals and of the grosser matching of distributions. Matched distribu- 
tions yielded experimental and control groups of 145 each from an original 


j ; Book Reviews 95 
L 194, contrasted with groups of 23 matched individuals each 
six variables). But the former method showed an insignifi- 
ce in the economic adjustments of graduates and drop-outs, 
s the latter method showed that the graduates were clearly more 
. The emphasis on the need for the repetition of experiments 
r situations, in order to test the justifiability of generalizations, 
worthy. 
pter and appendices dealing with sociometric scales should be of 
interest to psychologists. Some of these, both psychological 
ological, are already familiar, but the emphasis is on new instru- 
ich as Chapin’s Social Participation Scale and the revision of his 
Status Scale. ; 
re is an interesting but, in this reviewer’s opinion, unsuccessful 
pt to justify cause-and-effect conclusions from indices of associa- 
Chapin launches it by pointing out that the only alternatives are 
chaos, magic, or means-end relationships. But the logic is 
us, because the existence of such alternatives does not make it 
7 to conclude that one of a particular pair of associated 
es is the cause of the other. One may accept the principle of 
on without being justified in concluding that, since the differences 
le social adjustments of WPA workers and recipients of direct relief 
ta istically significant, and since the groups were matched on seven 
5, work relief has a more beneficial effect than direct relief (p. 42). 
meeivable that better-adjusted relief clients were selected for work 
the reviewer knows of WPA projects in which this was the standard 
ice), in which case superior adjustment was the cause of receiving 
elief, rather than work relief being the cause of superior adjust- 
lief in causation does not indicate the direction of specific cause- 
Hfect-relationships. Statistics show association; the attribution of 
connections is a process of deduction. But this recurrent fallacy 
importance, provided the reader is aware of it. 
is a valuable book for those who are concerned with the design 
ents in social, clinical, educational, and vocational psychology, 
as research workers or as instructors. Its exposition is clear,» 
in illustrative material, and the research principles which it 
are of widespread importance in the social sciences. It will 
e to introduce psychologists to an aspect of contemporary 
which many are too unaware. 
Donald E. Super 
iment of Guidance, Teachers College ; 


Columbia University 
AN 


96 Book Reviews 


J. G. Darley, Chairman, et al. The use of tests in college. Washington, 
D. C.: American Council on Education, 1947. Pp. vii+82. $1.00. 

Froehlich, Clifford P., and Benson, Arthur L. Guidance testing. Chi- 
cago: Science Research Associates, Chicago, 1948. Pp. viii+-104. 
$1.00. 


Clarification of problems through fresh insights regarding them can 
be a fruitful approach. Although this American Council on Education 
publication is addressed to a college audience, it contains much of value 
for the users of tests at most educational levels. 

The frame of reference centers upon five questions, “Who shall be 
admitted?”, “How shall students choose appropriate curriculums?”, 
“How shall we counsel students?”, “How shall we measure outcomes?”, 
“How do we measure behavior?” This method makes the material 
useful to college and secondary school administrators, counselors, and 
thoughtful instructors. The recommendations regarding the use of tests 
are properly cautious and practicable. 

A major value of the publication is its interpretative approach to test 
use in terms of generalizations, rather than use of a multitude of specifics 
related to particular tests. The reader is urged to consider carefully 
Section V, How shall we measure outcomes? (pp. 43-57). The presentation 
of an examination structure for colleges on pages 45 and 46 supplies an 
excellent general framework for considering test results in relationship 
to other types of data and several kinds of personnel workers. 

Tn the reviewer's opinion, the objectives and implications stated in 
the Foreword by Dean T. R. McConnell and Dean E. G. Williamson are 
well met. It is his opinion also that it is a somewhat restricted but 
generally excellent presentation. Strongly recommended reading for 
student personnel workers and administrators in educational institutions. 

Froehlich and Benson say: “This book is addressed to those individuals 
who are faced with the responsibility of carrying on a guidance program 
in which they must directly or indirectly administer and interpret tests, 
even though their training in tests and measurements is limited (p. v).” 

Guidance Testing poses again the problem of waiting until workers 
are competent before using tests or urging that they gain this competence 
by using tests cautiously. The book is definitely posited on the very 
practical philosophy that naive personnel in education will use tests and 
that it is sensible to help them to avoid errors in practice. 

A major strength is the frank facing of the fact that proper use of 
tests requires statistical knowledge so they include descriptions of simple 
statistical methods and interpretations, This is in line with the view- 
points of Bingham, Crawford, Darley, and others. 

The authors present typical tests of various kinds under the rubrics 


Book Reviews 97 


aptitude, achievement, interest, personal adjustment, and special 
(pp. 23-46). A footnote (p. 23) calls attention to the fact that 
use these tests as examples, not as a selected list of the best 
nts. Although agreement on the best instruments is probably 
le, some evaluation of excellence might have been helpful. One 
e most frequently asked and most legitimate questions of the new 
is, “What are the best tests for my purposes?” Perhaps the 
is, “Consult the nearest competent person.” 

eeling of the reviewer is that this is a helpful and useful book. 
particularly true if the audience for which it is intended follows 
with graduate training which will make the book no longer 


Milton E. Hahn 
versity of California at Los Angeles 


Clifford E. (Editor). A basic text for guidance workers. New 
Prentice-Hall, Inc., 1947. Pp. 566. $4.25. 
editor of this volume, Dr. Erickson, who is Professor of Education 
ctor of the Institute of Counseling, Testing, and Guidance at 
State College, has written several previous books in the guid- 
field. Drawing heavily upon Michigan State College guidance 
s as well as upon a variety of guidance experts in other school 
this text “attempts to portray many different aspects of the 
ce program and at the same time to indicate the extent of some 
ializations within the field as a whole.” In the preface the 
ecifies that the book is intended as a basic or beginning text for 
school counselors. 
general, the content of the book attempts to give the guidance 
(particularly the secondary school teacher) an over-view of the 
ce field including purposes, techniques, and administration of the 
program. Although the book fulfills its avowed aim to a large 
_ the quality of the chapters varies widely and some overlap is 
as is often the case with such symposia. Aside from certain minor 
e best chapters appear to be: the aims, objectives, and principles 
guidance movement (C. E. Erickson), interviewing techniques (S. 
in), therapeutic counseling (H. B. Pepinsky), helping pupils 
problems (P. L. Dressel), the community occupational survey 
h K. Wilson), the role of work experience (C. A. Weber), place- 
d follow-up services (L. O. Brockmann and L. Smith), and 
the guidance program (C. M. Horn). 
major criticisms of the volume can be summarized briefly: 
in quality and treatment of material as well as in level of 


98 Book Reviews 


difficulty (the chapter on therapeutic counseling may be heavy going for 
the average high school counselor); a tendency to apportion too much 


space to less important topics (case-study techniques and working with, 


home and community); lack of critical evaluation of occupational infor- 
mation sources and of testing instruments; lack of functional information 
about the world of work in the job levels where most secondary school 
students will be employed. 

The commendable features probably outweigh these criticisms, how- 
ever, There is an admirable emphasis on the primary importance of 
individual counseling. The many good, practical suggestions are of 
great potential value to the guidance worker and should earn the authors 
many a word of praise from hard-pressed counselors. Another fine 
feature is the inclusion of many illustrative forms and excellent bibli- 
ographies, 

Perhaps the most discouraging point raised by this book is the 
tremendous store of material, skills, and information the counselor must 
have as minimum working equipment. Insofar as guidance techniques 
can be put across in book form to the beginning counselor, A Basic Text 
Jor Guidance Workers is effective in attaining its aim. 

Bios okra, William A. McClelland 

Providence, R. I. 


Clarke, H. Harrison, The application of measurement to health and 
physical education. New York: Prentice-Hall Incorporated, 1945. 
$5.00 Pp. 415. 


This text is organized on a functional outline. After considering 
some of the fundamentals underlying testing in health and physical educa- 
tion, the measurement of physical fitness of social efficiency and physical 
education skills and appreciations are considered in turn. 

In the section on physical fitness, there is the usual discussion of 
medical and sensory tests, of cardio-vascular tests, and a section on 
measurements and estimates of nutritional status. There is a well- 
written and sound treatment of the problems and possibilities associated 
with the technique of somatotyping. This is followed by an excellent 
discussion of measurement in the field of posture. 

g Es author gives a somewhat undue amount of space to the so-called 
‘physical fitness index,” which is the percentage residual from the regres- 
sion value of a general strength test. In the opinion of the present 


physical fitness, it does not deserve the reverence given here. The 
author again reverts to this test in the next part of his book. 


Book Reviews 99 


aid of the book given over to tests of social efficiency, the 
introduced a section that is new to testing textbooks in this 
‘He discusses—and gives numerous references to—a number of 
the field of personality studies. This marks an advance in this 
and these tests should be given more prominence in physical educa- 
d health studies as time goes on and as these tests improve. 
field of skill tests, because of the large numbers of such tests, 
has had to choose a few of the more important ones to de- 
and to give bibliographical references for the others. His choices 


‘on the whole, been a The same has been true of his presenta- 


s into a chapter. 
whole, the book presents a number of fresh viewpoints and 


C. H. McCloy 


C. Measurement in Today’s Schools. 2nd Ed.; New York: 
tice-Hall, Inc., 1947. Pp. xviii+597. $4.50. 

J.C. Chapter Exercises and Tests to Accompany Measurement in 
8 Schools. 2nd Ed., New York: Prentice-Hall, Inc., 1947. 
+74. 


second edition of this elementary textbook in educational meas- 
; appears six years after the first edition. The organization, 
nd section headings, and approximately ninety-five per cent of 
tent remain unchanged. In general the references have been 
nt t up to date and the content modified sufficiently to incorporate 
_ The chapter tests which appeared at the end of each chapter in 
edition have been supplemented by appropriate problems and 
n a consumable workbook to accompany the text. 

text with its accompanying exercise book is designed specifically 
t course in educational measurement. It is well organized to 
the beginning teacher and partially trained school administrator an 
d insight into the theory of measurement, descriptive statistics 
vidual differences as they relate to school organization and in- 
Almost one-half of the text is devoted to the uses of measure- 
| motivation, learning, diagnosis, marking, grouping, promotion, 
and evaluation. Appropriate emphasis is placed on the con- 


100 Book Reviews 


While the approach to the use of measurements is functional and in- 
tegrative it is also traditional and uncritical. In the cataloguing of re- 
search the point of view is that of the professor of measurement rather 
than that of the director of school organization and learning. The extent 
to which our present knowledge of individual and trait differences in the 
schools points to new uses of measurement and needed reforms in school 
organization and practice is not sensed. Nevertheless these two books 
rate high among the teachable books available in the field. 


Walter W. Cook 
University of Minnesota 


Erratum 


In the December 1948 issue of the Journal of Applied Psychology, an 
error occurred in the article, “The Effectiveness of Intelligence Tests 
in the Selection of Workers” by E. E. Ghiselli and C. W. Brown. On 
page 576 the last eight lines of type should have been inserted following 
the first line on page 577. 


Erratum 


In the December 1948 issue of the Journal of Applied Psychology; 
under New Books, books by the following authors: H. L. Goldberg, K. 
Goldstein, D. T. V. Moore, Strauss and Lehtinen, L. R. Wolberg, and 
L. Szondi, were erroneously listed as being published by a book seller 


by the name of M. W. Drexler. The real publisher is New York: Grune 
and Stratton, Inc. : 


a New Books, Monographs, and Pamphlets 
ks, monographs, and pamphlets for listing and possible review should be sent to 


"Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


psychosomatic medicine. Franz Alexander and Thomas M. 
Editors. New York: Ronald Press Co., 1948. Pp. 568. 


ical apparatus: a classified bibliography. T. G. Andrews. 
ological Monographs No. 289. Washington, D. C.: American 
ological Association, 1948. Pp. 38. 
of German prisoners of war: a study of the dynamics of national- 
ic followership. H. L. Ansbacher. Psychological Mono- 
s No. 288. Washington, D. C.: American Psychological Asso- 
m, 1948. Pp. 42. 
of psy . Edwin G. Boring, Herbert S. Langfeld, and 
y P. Weld. New York: John Wiley and Sons, Inc., 1948. Pp. 


trends in clinical psychology. A. W. Combs, et al. New York: 
s New York Academy of Sciences, 1948. Pp. 62. 5 
srican woman in modern marriage. Sonya Ruth Das. New 
tk: Philosophical Library, 1948. Pp. 185. $3.75. ; 
and deafness. Hallowell Davis, Editor. New York: Murray 
ooks, Inc., 1948. Pp. 496. $5.00. 
tion of the level of aspiration experiment to the study of personality. 
vile K. Escalona. New York: Bureau of Publications, Teachers 
Columbia University, 1948. Pp. 132. $2.10. 
of training films in department and specialty stores. Harry M. 


Boston: Harvard Business School, 1948. Pp. 147. $1.50. 
New York: The Mac- 


e psychiatry. Leland E. Hinsie. 


Co., 1948. Pp. 359. $4.50. 

job. Fritz Kaufmann. New York: Harper and Brothers, 1948. 
of thumb- and finger-sucking in infants. Mary S. Kunst. Psycho- 

Monographs No. 290. Washington, D. C.: American Psycho- 


; ional personnel work. Corinne LaBarre. 
Council on Education, 1948. Pp. 54. 


101 


102 New Books, Monographs, and Pamphlets 


The commonsense psychiatry of Dr. Adolf Meyer. Alfred Lief. New 
York: McGraw-Hill Book Co., Inc., 1948. Pp. 677. $6.50. 

. The strategy of job finding. George J. Lyons and Harmon C. Martin. 
New York: Prentice-Hall, Inc., 1948. Pp. 408. $3.25. 

The open self. Charles Morris. New York: Prentice-Hall, Inc., 1948. 
Pp. 179. $3.00. 

Educational psychology. Harvey A. Peterson. New York: The Mac- 
millan Co., 1948. Pp. 550. $4.00. 

Training employees and managers. Earl G. Planty, William S. McCord, 
and Carlos A. Efferson. New York: The Ronald Press Co., 1948. 
Pp. 278. $5.00. 

The emotions. Jean-Paul Sartre. New York: The Philosophical Library, 
1948. Pp. 97. $2.75. 

The teacher as counselor. Donald J. Shank, et al. Washington, D. C.: 
American Council on Education, 1948. Pp. 48. $.75. 

The legend of Henry Ford. Keith Sward. New York: Murray Hill 
Books, Inc., 1948. Pp. 550. $5.00. 

Van Allyn methods manual. Keith Van Allyn. Palo Alto, Calif.: Surveys, 
ape 1948. Pp. 117. Manual plus 25 Qualification Inventories 
$7.50. 

Cybernetics. Norbert Wiener. New York: John Wiley and Sons, Ine., 
1948. Pp. 194. $3.00. 

Pediatrics and the emotional needs of the child. Helen L. Witmer, Editor. 
New York: The Commonwealth Fund, 1948. Pp. 180. $1.50. 

` Diagrams of the unconscious. Werner Wolff. New York: Grune and 
Stratton, Inc., 1948. Pp, 423. $8.00. 

Personnel management and industrial relations. Third Edition. Dale 
Yoder. New York: Prentice-Hall, Inc., 1948. Pp. 894. $5.00. 
Exploring individual differences. Committee on Measurement and 
Guidance. Washington, D. C.: American Council on Education, 

1948. Pp. 110. $1.50. 

, Exploring @ first grade curriculum. New York Board of Education. 
Publication No. 30. New York: Bureau of Reference, Research and 
Statistics, Board of Education, 1947. Pp. 104. $.50. 

Influencing and measuring employee attitudes. Personnel Series Number 
pr New York: American Management Association, 1948. Pp. 55. 

Problems and experience under the labor-management relations aci. Per- 
sonnel Series Number 115. New York: American Management 
Association, 1948. Pp.35. $.75. 

New patterns of employee relations. Personnel Series Number 117. New 
York: American Management Association, 1948. Pp. 50. $1.00. 


————— 


ial of Applied Psychology 


April, 1949 


Quantification of an Industrial Employee Survey. 
I. Method * 


Frank J. Harris t 
Division of Education and Applied Psychology, Purdue University 


research project to be described is an attempt to develop a new 
que to measure quantitatively the morale of industrial employees. 
e past, two general approaches have been made to this problem. 
is an adaptation of the attitude scaling technique first described 
one and Chave (3). By means of this technique a score is 
d which indicates the general attitude of employees toward the 
for which they work. However, this type of scale does not 
Management with very much insight regarding the attitudes of 
toward specific policies or practices. 
second approach consists of asking a number of questions about 
ic ‘aspects of company policy. This type of employee opinion 
y does provide management with the opportunity to obtain answers 
relatively specific questions with which it is often concerned. 
other hand, the answers determined from such a survey do not 
ll attitude scores of the type required if departmental, tenure, 
other similar comparisons are to be made. 
present study is an extension of the general employee opinion 
approach. Briefly, the questions on the survey are statistically 
d according to the generally accepted principles of test construction 
ndardization, thus combining the practical merits of the opinion 
y with the quantitative aspects of the attitude scale. i 
e have been speaking of morale as if it were a term the definition 
was generally agreed upon. Such, of course, is not the case. 
r, the adequacy of this study will not stand or fall on terminologies 
his article is based on the authors’ dissertation entitled “The Development ofa 
e Morale Score from a Generalized Industrial Employee Survey” submitted 
ilty of Purdue University in partial fulfillment of the requirements for the 
tad of Philosophy, August, 1948. The dissertation was directed by Dr. 


author is now serving as Research Psychologist, Division of Commissioned 
c Health Service, Federal Security Agency, Washington, D. C. 
103 


104 Frank J. Harris 


and for our purposes it has seemed unnecessary to go beyond an opera- 
tional definition of morale. The definition adopted therefore is: ‘Morale 
is the attitude of the employee, as expressed on an anonymous question- 
naire, toward the company for which he works, with a favorable attitude 
representing relatively high morale and an unfavorable or neutral attitude 
representing a relatively lower level of morale.” 


Development of the Morale Scale 


Description of the original survey. The data were obtained from a 


survey conducted early in 1948 by the Victor Adding Machine Company 
in Chicago, Ill. The forms were mailed to the home addresses of all 
employees of the company. The forms were returned by the employees 
directly to Purdue University. The individual employees could not be 
identified in any way. The employees had been informed in advance 
of the nature of the project and their cooperation was requested in filling 
out and mailing the forms in an enclosed self-addressed, stamped envelope. 
Approximately 800 questionnaires representing 75% of the employees 
were returned. All of the data were coded and punched on I.B.M. cards 
for more convenient analysis. An analysis of the percentages of em- 
ployees in various categories responding to each alternative was made 
quite independently of this study and forwarded to company officials. 

Initial screening of items. Not all of the items in the original ques- 
tionnaire could be presumed to be directly measuring attitude toward 
Management or toward the company. Accordingly 48 questions were 
selected from the total which were considered to be appropriate to 
the study at hand. These 48 items with their alternative responses 
were then reproduced and presented to 10 judges with the following 
instructions: 


The statements below are part of a questionnaire administered to employees 

ee ee ey. Kindly check the one response to Ree 

arty A peta most strongly represents a favorable attitu 

The judges were all advanced students in or professors of industrial 
psychology. It was arbitrarily determined that items on which there 
was 80% agreement or better would be retained at this stage. Forty-six 
items met this criterion. In fact there was unanimous agreement On 
42 items, 907% agreement on one, and 80% agreement on two. On one 
item of the questionnaire, which dealt with the filling of job vacancies, 
there were six possible Tesponses. On this item, seven judges selected 
one response, while the three other judges selected a second response: 
Tt seemed logically justifiable to retain this item by considering either of 
these two alternatives as favorable. 


V 


Quantification of Industrial Employee Survey. I 105 


onnaires returned, those on which the respondent had failed to 
nswer all of the biographical items were discarded. The remaining 
753 were divided into two groups. These groups were randomly selected 
after the following stratifications had been imposed: male or female; 
; ied or single; weekly or hourly-paid; worker, set-up man or super- 
r; length of service. One group, consisting of 377 employees, was 
msidered the experimental group; the other, consisting of 376 employees, 
held out for further analysis at a later stage. 
Item analysis. A key card was prepared on which was punched the 
onse to each item which represented “high-morale.”! The cards of 
experimental group were scored in terms of the total number of high- 
ale responses. The 100 highest scoring employees and the 100 lowest 
ing employees were selected and the degree of internal consistency 
each item was determined in terms of discrimination or D-values 
ng Lawshe’s nomograph (1). Items having a D-value of 1.0 or 
r were arbitrarily retained to comprise the scale. The 36 items 
h met this criterion, with their respective D-values, and with the 
-morale response indicated, are presented in Table 1. It will be noted 
, without any such intention on the part of the author, these items 
nbrace many of the factors which various investigators have reported 
be related to industrial morale. It is also worthy of mention that all 
of the final 36 items are those which the judges had previously agreed 
pon unanimously, provided either of two responses is accepted for 
36. 
Reliability of the scale. At this point the experimental group of cards 
pon which the scale had been developed and analyzed was removed from 
urther consideration. The second group which had been held out until 
this time was now scored in terms of the 36-item key. The odd-even 
eliability coefficient for this group was determined to be .72; correcting 
means of the Spearman-Brown formula for the complete scale of 36 
s yielded a reliability coefficient of .84. 


Analysis of Morale Scores 


Once the morale scale had been developed and was found to have 
tisfactory reliability, it was possible to proceed with an analysis of the 
es of various categories of employees. The results of this analysis 
shown in Table2. In all of the comparisons presented the significance 
E à In this study, responses chosen by the judges as representing the most favorable 

y al titude toward the company are termed “high-morale” responses. As used here the 
term may be considered as equivalent statistically to the term “correct” as it is cus- 
; tomas employed in item analyses of test items. 


106 Frank J. Harris 


of differences was determined for group means and for group standard 
deviations. The significance of mean‘ differences is expressed in terms | 
of Fisher’s t statistic; the significance of standard deviation differences 
is expressed in terms of Fisher’s F-ratio as tabled by Snedecor (2). A : 
t value which is significant at the 10% level of confidence is indicated 
by an asterisk. A t or F value which is significant at the 5% level is 


Table 1 : 


Final Morale Scale Items and Discrimination Values 
Item D-value 
What Is Your Opinion of Your Boss 
(the Man You Report to) 
1, Does he “know his stuff”? 1.20 
2. Does he play favorites? Ë 1.35 
3. Does he keep his promises? Yes..x..No...... 1.50 
4. Does he pass the buck? Yes......No.-x.. 1.10 
5. Does he welcome suggestions? Yes..x..No.....- 1.50 
6. Is he a good teacher? x... My 1.70 
7. Do the workers know more than he does? Yes......No..x.. 1,50 
8. Does he set a good example? Yes..x..No...... 1.50 
Do You Feel You Understand the Following Provisions 
of the Employees’ Security Fund? 
9. How the money is divided among the employees? Yes..x..No.....- 1.75 
10. How the Company decides how much goes to 
this fund? 1.80 
11, How the Security Fund money is invested? ` 2.30 
12. How much you get if you leave, die or retire? Yes_x..No...... 1.70 
18. Do you feel that you are receiving considerate 
treatment here? Yes..x..No...... 1.00 
14. Do you feel top management is interested in the 
employees? Yes..x..No...... 1.90 
15. Have you ever recommended this Company as a 
place to work to a friend? Yes.x.No...... 1.25 | 
16. Do you feel you have a good future with this 
Company? Yes..x..No...... 1.85 
17. What do you think of working conditions here as | 
compared with other plants? 
Above a _.x..Average......Below average...... 1.60 
18. How do you think your eae teki hike a Na | 
(gross earnings before deductions) compare with 
that paid in other companies for the same type 
of work? Better here_x About the same... Lower here.... 1.20 
Give Careful Thought to the Following List of Company Policies Affecting Employees 


Working Conditions, and Employee Benefits. ink About 
Each Item os It Is che Tonk: Then Check What You Think 
ae LR ge nd 


Quantification of Industrial Employee Survey. I 
Table 1 (Continued) 


Xx. z 
pe as _— 
Ss oe 
eS ate 
d xe ae 
ince for promotion fe a a 
cal Department SKIS ity): AAA 
a Su Pease ese fe 
> ee 
ox e 
ployee Committees eck Soe A PEI 
mon find your fellow workers: 
H Friendly..x..Unfriendly.....Indifferent...... 


That does your family think of this Company? 
} Good place to work..x..No opinion......Poor place to work... 
How do you like your present job? 
% Very much_x.. 
i Pretty good..... 
you think the employees have confidence in the operating 
ds of the business? 
Most employees do.x.. More than half of them... 
About half......Less than half.....Few of them...... 
do you feel your opportunities in this Company compare 
those with your last employer? 
Better..x..Not so good......About the same...... 
Never worked elsewhere...... 
are your work plans for the future? 
Hope to remain here.x..Plan to work only a short time... 
Do not plan to work......I have other work plans... 
desirable job vacancies arise, how do you feel they are 
lly filled? . 


By employing people outside the Company —---... 
By promoting fayored employees who are 
not especially qualified iias. 
By giving first chance to employees of long 
service Ieee, 
By taking the most qualified person eo 2 
I am not sure how they are filled Sen 


By both ability and service PEEN 


107 


D-value` 


1.80 


1.70 


1,35 


1.30 


108 Frank J. Harris 


of the sexes do not differ significantly, the men are significantly more 
variable than the women. 

Marital status. All married employees combined yield a significantly 
higher mean morale score than all single employees combined. In an 
attempt to determine whether either sex might account for this difference, 
a further breakdown was made. It may be seen that differences between 
the married and the single may be attributed primarily to the significantly 
higher scores obtained by married men as compared with single men, 


Table 2 
Comparisons of Morale Scores Among Employee Sub-Groups 
N Mean §D. t F 
Sex 
Male 253 26.99 6.41 13 136" 
Female 123 26.91 5.49 : i 
Marital status 
Married 244 27.59 6.11 A 
Bingle 132 25.80 599 elle 
Married men 178 27.66 6.28 MM vs SM 2.56*** 1.06 
Single men 75 25.39 6.47 MW vs SW 97 12% 
Married women 66 27.39 5.69 MM vs MW 32 12l 
Single women 57 26.385 5.20 SM vs SW 94 1.55" 
Type of job 
Worker 241 26.83 6.01 Wvss 17 11 
Supervisor 101 26.96 6.51 W vs S-U 102 1.12 
Set-up man 34 27.91 5.67 S vs S-U 80 182 
Method of pay 
Weekly 98 27.76 6.26 
Hourly 278 2648 648 ae! 
Weekly-paid worker 53 26.79 7.02 1.16 
Hourly-paid worker 188 2684 570 ai 
Weekly-paid supervisor 43 28.77 5.17 P 86** 
Hourly-paid supervisor 58 2562 7.05 OS 
Length of service ` 
Under 6 months 50 25.10 
6 mos. to 1 year en arte: lras 2.22% 1.64" 
1 to 2 years 87 26.25 567 166" 109 
2 to 5 years 65 27.60 6.52 ae 
5 to 10 years 62 26.26 5.80 A 
Over 10 years 26 30.65 5.94 3.14*** 1.05 
A E E E OM A | ava O 
* Significant at the 10% level. 
** Significant at the 5% level. 


*** Significant at the 1% level. 


Quantification of Industrial Employee Survey. I 109 


status does not appear to affect the morale scores of women 
ees significantly. It is also evident that the greater homogeneity 
en’s scores is due more to the single than to the married women. 
ifferences in type of job. Employees were asked to indicate whether 
job was best classified as that of a worker, supervisor, or set-up 
Comparative morale scores were determined for these three general 
ories. None of the differences between these groups is significantly 
ater than chance alone could reasonably explain, contrary to what 
j might expect on the basis of previously reported findings. 
Weekly vs. hourly-paid jobs. The scores of all employees who were 
weekly salary were compared with the scores of all employees who 
epaid on an hourly rate basis. Since there was a tendency for weekly- 
| employees to score higher than hourly-paid employees, further anal- 
of the data were made to determine whether workers or supervisors 
might account for this trend.? The results of this analysis indicate that 
eekly-paid supervisors account for the higher morale scores of 
ly-paid employees in general. Also, as a group, the scores of weekly- 
d supervisors are more homogeneous than the scores of any com- 
able group. 
ength of service. Employees were asked to indicate whether they 
worked for the company (1) under six months, (2) from six months 
0 one year, (3) from one to two years, (4) from two to five years, (5) from , 
10 years, or (6) over 10 years. Scores were analyzed in terms of 
8e six categories. The results indicate that morale scores are lowest 
and most heterogeneous under six months. From six months to 10 
they appear to fluctuate to an insignificant extent. After 10 years 
ey again take a significant swing upwards. 


sal 


mM Summary and Conclusions 
An attempt was made to develop a quantitative morale scale by 


treating responses to an industrial employee survey according to standard 
‘development procedures. A questionnaire containing specific items 
iterest to management was filled out anonymously by approximately 
of the employees of a Midwestern manufacturing company and 
led directly by each employee to Purdue University. 

Questions which were obviously related to morale were judged by 
Competent individuals in terms of the one alternative response which 
s ented a favorable attitude toward the company. The 46 items 
“pon which there was 80% or higher agreement constituted the original 


a 


t-up men were not included in this analysis since 32 of the 34 employees in this 
y were weekly-paid. ; 


gd 


110 Frank J. Harris 


. The questionnaires were then separated into two stratified random 
samples containing 377 and 376 cases respectively. One of these samples 
was scored and a high and low group of 100 cases each, based on total 
score, were selected. The per cent of the high scoring group and the 
per cent of the low scoring group responding to each item was determined. 
From these percentages the discrimination value of each item was com- 
puted. The 36 items having D-values of 1.0 or higher constituted the 
final scale. The sample which had been held out was scored in terms of 
the 36-item key. A corrected reliability coefficient of .84 was obtained 
by the split-half (odd-even) method. 

The results of an analysis of the morale scores of various employee 
sub-groups are presented. 

It should be pointed out and emphasized that the results of the anal- 
yses reported here are specific to the data from the particular company 
which cooperated in the study. Any attempt to generalize from these 
results as to the relative levels of morale among various groups of in- 
dustrial employees would be hazardous. 

No attempt has been made to explain the differences or lack of dif- 
ferences found. Such differences can be most safely interpreted by 
individuals thoroughly familiar with the plant from which the data were 
obtained, The results of the survey give such individuals a clue as to 
the focal points of the industrial relations program which might possibly 
call for special attention. 

The methodology used in developing the scale may, on the other 
hand, be profitably applied to any industrial situation where data of the 
type described are available. The advantages accruing to any given 
company from this approach are several. In addition to the types of 


information usually obtained from an employee survey of this sort it 
becomes possible to: 


1. Obtain a reliable and quantitative estimate of the relative morale 
levels of various groups of employees such as workers and supervisors, 
old and new employees, married and single employees. 

2. Obtain a reliable indication of those areas in which a change in 
policy would seem desirable. 

8. Becure comparable data with which to compare the state of morale 
from time to time and thus reflect the effect of any changes introduced 
by management. . 

4. Accomplish the above with little more effort than is involved i? 
the treatment of the ordinary attitude survey. 


Much more could be accomplished by further extensions of this 
approach. Working cooperatively through a common consultant 4 


ee 


Quantification of Industrial Employee Survey. x 111 


panies would be able to obtain an indication of the level 
their employees as compared with employees offother com- 
it were possible also to relate the morale score of,the worker 
sor under whom he works, industry would be better able to 
| one of the major sources of differences in employee attitude. 

that the technique reported here will lead to further investiga- 


References 

H., Jr. A nomograph for estimating the validity of test items. J. appl. 
0 , 1942, 26, 846-849. 

n Q. W. Statistical methods. Ames, Iowa: Iowa State College Press, 1946. 


L. L., and Chave, E. J. The measurement of attitude. University of 
Press, 1929. 


The Quantification of an Industrial Employee Survey. 
Il. Application * 


Frank J. Harris t 
Division of Education and Applied Psychology, Purdue University 


In a previous paper,! the author described a technique for developing 
a quantitative morale score by applying the principles and methods of 
test construction to an industrial employee survey. Advantages claimed 
for this approach are that the employer can secure comparable data with 

` which to compare the state of morale from time to time, can obtain a 
reliable indication of those specific areas in which a change of policy might 
seem desirable, and is provided with a measure of the effect of any 
changes that are instituted. The present paper attempts to illustrate 
these advantages. 

The survey from which the morale scale was developed was conducted 
in 1948, A similar survey had been conducted in 1945 for the same com- 
pany, by the same consultant, and in the same manner. Of the 36 items 
selected from the 1948 survey to constitute the morale scale, 35 had 
appeared on the 1945 survey with minor modifications in wording in @ 
few instances. In the earlier survey, 555 or 65% of the employees re- 
turned the questionnaire. Of the total respondents 60% were men and 
40% were women. In the later survey, 800 or 75% of the employees 
responded of whom 66% were men and 34% women. Thus the two 
groups can be expected to be reasonably comparable in sex ratio and 
employee representation. 

J The results of the two surveys were examined to determine the direc- 
tion and extent of any changes which might have occurred in the inter- 
vening three year period. A comparative study of this sort could be 
made in at least two general ways, depending upon the type of infor- 
mation desired. One way would be to score both sets of questionnaires 
in terms of the 36-item key. From the obtained scores it would be possi- 


* This article is based on the author's dissertation entitled “The Development of # 


Quantitative Morale Score from a Generalized Industrial Employee Survey” submitted 
bia the Ta ribs University in partial fulfillment of the requirements for the 
nail pies St, Philosophy, August, 1948, ‘The dissertation was directed by Dr. 


t The author is now serving as Research P, logi ivisi issioned 
Officers, Publio Health Service, Federal Seo ee 
Harris, F. J. The quantification of an industrial ‘ ` I. Method: — 
J. appl. Psychol., 1949, 33, 103-111, a euor purvey 


112 


Quantification of Industrial Employee Survey. II 113 


compare the morale of employee sub-groups at the earlier and at 
ter time. Another way would be to compare the responses to each 
n by determining the per cent of employees who responded favorably 

e item at each administration of the questionnaire form. The latter 
of analysis was made in this study; for each item the difference was 
rmined between the per cent of respondents who indicated a favorable 
onse in 1945 and the per cent who indicated a favorable response in 
The level of significance of the differences between these per- 
ages was determined by the computation of t-values. 


On 19 of the 35 items there was a shift in the high-morale or favor- 
direction at the 1% level of significance or better. 
There was a favorable shift on three items at the 2% to 10% level 


8. Only one item, “Does your boss play favorites?”, showed an un- 
rable change in attitude (at the 4% level of significance). 
4. The remaining 12 items revealed slight changes in either direction 


5 obtained of the general level of attitude toward each item or policy 
esented thereby. For example, although there was a markedly 


onding favorably in 1945 to 37% responding favorably in 1948) the 
vel” of attitude remained rather low. On the other hand, 90% of the 
nployees liked the group insurance plan in 1945. and in 1948. 

The final interpretation of the findings and the uses to be made of 
rest on decisions of the sponsoring company. The changes in morale 
‘attitude were undoubtedly influenced to some extent by factors ex- 
al to the company, e.g. conversion from war to peacetime production. 
owever, management now has reliable indices of how certain of its 
practices have been received by the employees, has some definite clues 
as to what effects its policy changes have had on morale, and is in a better 
ition to chart its future course in personnel relations. 


Received September 23, 1948. 


* Complete data are on file in the Purdue University Library and in order to reduce 
ting costs a summary prepared in table form has been deposited with the American 
mentation Institute. Order Document 2625 from American Documentation 
itute, 1719 N St., N.W., Washington 6, D. C., remitting $.50 for microfilm (images 
‘inch high on standard 35 mm. motion picture film) or $.50 for photocopies (6 by 8 
i ches) readable without optical aid. 


“Item Analysis” Versus “Scale Analysis” * 


Philip H. Kriedt and Kenneth E. Clark 
University of Minnesota 


In the last few years Dr. Louis Guttman of Cornell University has 
developed a new and increasingly popular technique for determining 
whether or not a test or attitude scale possesses unidimensionality (8). 
This paper presents a comparison of this technique of scale analysis with 
two older methods of item analysis, in order to determine the comparative 
values of each method for selecting from a pool of items those which 
belong together, either in terms of their internal consistency, or in terms 
of their unidimensionality. 

The three methods herein compared are: (1) the Cornell Technique 
of Seale Analysis, in which the essential statistic is reproducibility, and in 
which emphasis is placed on the ability to predict or “reproduce” the 
response of an individual to every item of a scale in terms of his total 
‘score on that scale; (2) one common form of item analysis, specifically, 
that in which the item-responses made by persons in the top twenty-seven 
per cent of the distribution on total score are compared with the re- 
sponses made by persons in the bottom twenty-seven per cent on total 
Score, using the phi coefficient as a measure of the correlation between 
item and total score; and (3) the determination of inter-correlations be- 
tween items as a means of selecting those which are measuring the same 
thing, using as the measure of relationship the tetrachoric correlation 
coefficient. 

A 72-item Likert-type questionnaire on attitudes toward Negroes, 
made up of items with 3, 5, and 7 categories of response, was administered 
to 183 students in an elementary course in Social Science at the University 
of Minnesota. In general, the content of these items was extremely 
heterogeneous. The scale was scored initially by assigning arbitrary unit 
values to each of the response categories, as in the usual Likert-type scale. 

; Since the analytic methods being compared would be affected con- 
siderably by the methods used to reduce all item responses to dichotomies, 
some preliminary work was done to determine how best to group item 
responses. Response categories were first combined so as to maximize 
the per cent reproducibility of each item and, whenever possible, to make 


* The writers are indebted to the University of Mi 1 for the 
research grant that made this study possible. Minnesota Graduate Schoo! 


114 


“Item Analysis” Versus “Scale Analysis” 115 


h category have less error than non-error, in accordance with the re- 
nts of the Scale Analysis methods. However, this approach did 
seem particularly promising for the development of a good scale, since 
hotomizing items so as to maximize reproducibility, without regard 
other item characteristics, tends to provide dichotomies with high 
response frequencies; that is, items which are answered the same 
y a large proportion of the respondents. Items were also dichoto- 
, therefore, on the basis of item correlation with total score. The 
per cent and the bottom 27 per cent on total score were selected, 
eir responses to every possible dichotomy of response categories 
compared, using phi coefficients computed using Jurgensen’s tables 
That combination which maximized the phi coefficient was used. 
se dichotomies had few high modal response frequencies, since the 
hod used tends to penalize items deviating markedly from a 50-50 

Ten items were found to have such low phi coefficients for any 
ination of responses that they were not included in the later analyses. 
For the remaining 62 items, all of which were now dichotomized, the 
lowing computations were made: all inter-item correlations, using the 
heshire, Saffir, and Thurstone computing diagrams (1) (method A); 
‘coefficients for each item versus total score on the 62 item dichoto- 
scale (method B); and the per cent reproducibility of each item as 
of the 62-item scale (method C). In addition, all seventy-two 
were carefully read by the writers, and twenty-seven items selected 
representing, in the judgment of the writers, the primary factor being 
easured by the scale. All questionnaires were rescored for these twenty- 
items, and reproducibilities computed for each of the items using 
new total score (method D). 

Four separate bases thus existed for the selection of items for a 
, more unified scale. Using each method, a ten-item scale was 
structed. These four 10-item scales will be referred to hereafter as 

A, B, C, and D. Scale A was made by selecting the ten items 
yh ose intercorrelations with each other would be maximized; scale B was 

up of items with the highest correlation with the total score; scale 
C consisted of the 10 items having highest reproducibility in the 62-item 
mputation ; and scale D the 10 items having highest reproducibility 
lected from the special group of 27 items. The items in the two 
oducibility” scales (scales C and D) had no more error than non- 
rror in each category, as required by the Guttman method. These two 
les were almost identical, having eight of their ten items in common. 
_ None of the other scales, however, had more than three items in common. 
___ A statistical description of each of these four scales is presented in 
Table 1. For each scale is reported: (1) reproducibility when rescored 


Philip H. Kriedt and Kenneth E. Clark 
Qatanen 


116 


08 Z9 49 wƏWN TL 99 49 SL WN 69 89 64 08 WƏW +6 £6 98 88 wW 
o o OF N o o o OF N o o o OT N or o o OF 
I #0 Z @ 6-01 
6F-SF 62-06 
| eas +9-09 68-08 
TA 69-99 eSt T 670r 
EERTE 69-09 eS E ¥9-09 | es ee 69-09 T 
IES TOA 69-09 ae Soe 69-99 € 1-6-1. 69-09 pend 
jee 64-04 Z g FL-OL ea Sea eke! | 62-04 DA E) ev 
[ed ae A 68-08 i ¥ 62-94 eS G 7 68-08 {ee Fon rE 6-06 
Z 66-06 I #8-08 Z Ei 66-06 @ & I 
De UEN: URN a i 8 Eo AE | Fe E e E o N A S 
areg ƏS arog aog 
sorouen! S4 Uk ox OJ, “SA U saryiqronpordayy 
osuodsnay POW “yo suonnquasicr C egoo Ma % Jo suorngustcr 
Jo suonnquysiq 30 SUCH OEIC: 


(q ƏS) SWAI 2g JO 100d FuysQ AgrIGQIonpordoy 
pus ‘(O OVIS) SWAI Z9 Jo joog Zus Aymqrnposdoy ‘(g VS) 2109S T870 
PUB wazy UOOMJog UOVPLON ‘(Vy AVIS) SVOVL We}]-I9JUT :UO SWOI 4Soq UOT, JO SONSTIO}OBIEYO VIF PUV WAI 


T BL 


“Ttem Analysis” Versus “Scale Analysis” 117 


for the ten-item scale; (2) phi coefficient indicating correlation between 
_ item response and total score for the ten-item scale (using top 27 per cent 
‘against bottom 27 per cent); (3) the median inter-item correlation be- 
tween an item and the other nine items in the scale, using tetrachoric 
7s; and (4) the modal response frequency of items (i.e., the percentage 
‘of respondents who answered the item in the same way). The odd-even 
reliability (estimated from the Spearman-Brown prophecy formula) is as 
follows for each of the ten-item scales: A, +.90; B, +.91; C, +.83; and 


Results 


_ The relative merits of each of the three methods of item selection and 
scale refinement are discussed below in terms of the data presented. 
= Scale A (Inter-Item Tetrachoric Correlations). The selection of ten 
items from the pool of 62 items so as to maximize the median inter-item 
tetrachoric correlation coefficient produced a scale which does not quite 
meet Guttman’s criterion of 90 per cent reproducibility (see Table 1). 
"However, this scale does compare favorably with the other scales when 
examination is made of the phi coefficients for each item versus the total 
_ score, and of the modal response frequencies of the ten items. Thus, 
the use of this method of item selection yields a relatively good scale in 
spite of the fact that the tetrachoric r is not an appropriate statistic to 
" Use with data of this kind. Hada more appropriate measure of correla- 
' tion been used, one would assume that this method for selecting items 
would have yielded the best scale of the four. For the writers to have 
used another statistic would have made the labor and expense of com- 
Putation with a matrix of 62 items prohibitive. The writers therefore 
fell back on the same solution used by many others and resorted to the 
Thurstone et al., tetrachoric computing diagrams as a ready, and reason- 
ably approximate estimate of the relationship between items. Some 
items, however, have extreme response splits, so that the value of r could 
only be estimated, or became a meaningless value of plus or minus 1.0. 
Scale B (Top versus Bottom 27 Per Cent). Comparing the item re- 
sponses of extreme top and bottom groups fails to work as a method of 
producing a scale having unidimensionality, as defined by Guttman. It 
_ ‘does, however, produce a scale having high internal consistency as meas- 
ured by the odd-even reliability coefficient (.91), or median item versus 
total score phi coefficient (.79). Its items, moreover, discriminate well 


1 A discussion of the disadvantages of the use of tetrachoric correlations with attitude 
Seale items is discussed in Gage (7). That tetrachoric r’s give different values than are 


“obtained with other statistics was demonstrated empirically for one ten-item matrix. 
_ Greatest discrepancies occur when the modal response frequency approaches 100 per cent. 


yi 


118 Philip H. Kriedt and Kenneth E. Clark 


over a wide range, being more satisfactory in this respect than the items 
producing higher reproducibilities. This method has the additional 
advantage of being less laborious, and of involving less judgment and 
more mechanical selection of items, than the Guttman methods. 

Scales C and D (Reproducibility). If one accepts Guttman’s definition 
of unidimensionality of a scale, one requires among other things that the 
scale have a per cent reproducibility of 90 per cent or more. The only 
scales which meet this requirement are scales C and D. Furthermore, 
practically the same results were obtained when items were selected from 
a pool of 62 heterogeneous items as when from a much more homogeneous 
group of 27 items. These results obtain even though the Cornell Tech- 
nique of Scale Analysis is not designed primarily as a method of item 
analysis and item selection, and even though it is not intended to be used 
in the mechanical fashion in which it was used in this study. 

The use of scale analysis for selecting items does have some dis- 
advantages, however, in terms of the response distributions of items. 
Scales C and D selected items which were answered the same way by a 
large proportion of the respondents (80.2 per cent and 81.6 per cent). 
Moreover, scales C and D are inferior to the other two scales in terms 

of odd-even reliability. 


Discussion 


Tt has been the purpose of the present paper to present the results 
of an application of the Cornell Technique of Scale Analysis to an attitude 
scale in order to compare its workings with those of two methods which 
have been heretofore considered appropriate for scale refinement. There 
are certain side issues which come up in the use of the Cornell Technique 
which complicate its use in such circumstances. The chief obstacle is 
that one must consider several features of an item at the same time in 
manipulating data for analysis. For instance, the fewer response cate- 
gories an item has, the easier it will be to “reproduce” that item’s re- 
sponse, knowing an individual’s total score on the scale. A scale made 
up of dichotomized items thus has higher reproducibility than a scale with 
the same items with three or more Tesponses. Also, the prediction of an 
individual’s response to a particular item can be made with greater 
accuracy if a very high percentage of the total group answer that item 
in the same way. A scale made up of only very popular and very 
unpopular items will, therefore, have higher reproducibility than one 
made up of items of varying degrees of popularity. To avoid spuriously 
high reproducibility resulting from many items of this sort, Guttman has 
set up the requirement that no category have more error in it than non- 
error. Thus when 90 per cent agree with an item in a scale, we must 


——- 


“Item Analysis” Versus “Scale Analysis” 119 


correctly, half of the time, from total score, not only who the 
er cent are who say agree, but who the 10 per cent are who disagree. 
“It is difficult to process one’s data keeping these various requirements 
mind. (One wishes that the mechanics of scale analysis could receive 
ort of synthesis and organization which the Wherry-Doolittle method 
'ovides in solving the problems of multiple regression.) In addition, 
finds that the safeguards invented by Guttman occasionally permit 
hless items to remain in the pool. Most serious weakness is in the 
ore-error-than-non-error-per-category rule. It is possible to have an 
em with 99 per cent reproducibility which meets this rule, which is 
one heless worthless in that it has zero relation to the total score on 
scale. If all but two persons agree with an item, and one of these 
the highest total score and the other the lowest total score, then the 
item is valueless but meets the requirements Guttman sets forth. 
_ Guttman’s techniques cannot be used easily by research workers who 
ave not had considerable experience with them.* Much judgment 
be exercised in the combining of response categories and in balancing 
several criteria of unidimensionality which Guttman has developed 
roducibility, more error than non-error in each category, items 
d at various intervals along the range of modal response fre- 
cies). Special care must be taken to avoid the selection of too many 
is with high modal response frequencies, since such items, while 
ving high per cent reproducibilities, tend to have low reliability and low 
scriminating power. 
Thus in one sense, Guttman’s approach is a less satisfactory approach 
the problems of scale refinement than the traditional methods. The 
orker is required to judge, first of all, whether or not items can logically 
be considered to belong together.* He must then scrutinize the pattern 
responses of individuals to each of these selected items in terms of 
tal scores on the scale and decide how best to combine item response 
ries in order to improve the scale. Throughout the entire analysis, 
are no rigorous tests applied to determine which of several methods 
ill work best. In fact, the worker must keep in mind several different 
em characteristics while he works. 
In spite of the mechanical difficulties of scale analysis, however, the 
ters find it a valuable and useful technique. The judgmental processes 
tioned above do have the beneficial effect of compelling the in- 
gator to become better acquainted with the data with which he 


Edwards (5) has shown that even Guttman’s own published 
Yield different results than originally reported. 


data may be reworked. 


120 Philip H. Kriedt and Kenneth E. Clark 


works. The forcing of judgments ‘on the worker constantly takes him 
back to the data themselves and this is highly desirable. Moreover, there 
are advantages in predicting a response from total score instead of the 
reverse, and in predicting from the total score instead of predicting the 
response to one item from the response to another item. Consider a 
scale which obviously has perfect unidimensionality; for instance, the 
questions: Are you over 10 years old? Are you over 20 years old? Are 
you over 30 years old?, ete. Knowing the total score, the responses to 
every item can be reproduced with perfection. Knowing the response 
to only one item, one may or may not be able to predict the responses to 
all of the other items, and one cannot, therefore, always predict the total 
score without error. High reproducibility, therefore, has more meaning 
in defining the unidimensionality of a scale than either high item-versus- 
total score correlations or high item-versus-item correlations (4). 

Finally, one must avoid thinking of scalability, as defined by Guttman, 
as a “good” characteristic of a series of items and of non-scalability as a 
“bad” characteristic. The use which is to be made of the series of items 
must always be considered. If the measure in question is to be used as a 
predictor variable for instance, scalability may be irrelevant or even 
undesirable. If the measure is to be used in a study of mental or per- 
sonality organization (perhaps as a measure of what Gordon Allport has 
called a “common trait”), it should represent but one dimension, and, 
hence, should be scalable. The measurement of public opinion also 
makes profitable use of scales having high reproducibility. 

In summary, the writers feel that Guttman’s new scale analysis 
techniques can prove to be very useful in problems of psychological 
measurement.’ Considerable discretion must be exercised, however, 
both in the selection of suitable problems to which these methods may 
be applied and in the way the methods themselves are handled. 


Received September 4, 1948. 


References 


1. Cheshire, L., Safir, M., and Thurstone, L. L. Computing diagrams for the tetrachoric 
i noe coeficient. University of Chicago Book Store, Chicago, 1933. 
5 c K. E., and Kriedt, P. H. An application of Guttman’s new scaling tech- 
niques to an attitude questionnaire. Educ. psychol. Measmt., 1948, 8, 215-224. 
‘For a further discussion of the im sists b’s 
analysis of the “trait status” score (3), “ale reproducibility, see Coom 


In the present article the writers have attempted to call attention to the advantages 


aan F a lysis methods in terms of the results obtained when these 
ean Ha yed. hat ba oo do not, in practice, live up to the promise they 

ow in theoretical terms may be due in part to $ A E Jaio. 
For a discussion of this point see Clark and K: Bee at ak analyeis ic oni 


“Item Analysis” Versus “Scale Analysis” 121 


H. Some hypotheses for the analysis of qualitative variables. Psychol. 
1948, 55, 167-174. 
A simple test for predicting opinions from their subclasses. Int. J. 
Altitude Res., 1948, 2, 1-25. 
, A. L. On Guttman’s scale analysis. Educ. psychol. Measmt., 1948, 8, 
318, 
A. L., and Kilpatrick, F. P. A technique for the construction of attitude 
ales. J. appl. Psychol., 1948, 32, 374-384. 
. L. Scaling and factorial design in opinion poll analysis. Purdue Univ. 
s in Higher Educ. LXI, 1947. pp. vi + 87. 
n, L, The Cornell technique for scale and intensity analysis. Educ. psychol. 
Measmt., 1947, 7, 247-280. 
sen, C. E. Table for determining phi coefficients. Psychometrika, 1947, 12, 


The Airline Pilot’s Job * 
Thomas Gordon 
American Institute for Research, Pittsburgh, Pa. 


It is the purpose of this paper to report certain aspects of a study 
conducted by the Aviation Branch of the American Institute for Research 
under the auspices of the National Research Council Committee on 
Aviation Psychology! Funds for the project were furnished by the 
Civil Aeronautics Administration. This study, completed in November, 
1947, was undertaken (1) to study current methods of selecting and 
evaluating the airline pilot and (2) to determine the critical requirements 
of his job. It was intended that the data obtained in this investigation 
be used as a basis upon which to develop improved procedures for 
selecting, training, and certifying airline pilots. At present the American 
Institute for Research is utilizing the data as a basis for devising 
radically new type of flight examination for pilots seeking the Airline 
Transport Rating certificate. This latter project is under the same 
sponsorship as the study to be described in this paper. 

In the first phase of the study the general procedure followed was to’ 
survey the available sources of information pertaining to present methods 
of selecting and evaluating airline pilots. In the second phase of the 
project the procedure was to survey sources of information about the 
critical requirements of the airline pilot’s job, an attempt being made 
to answer the question: “What behavior and characteristics are re- 
quired for handling the job safely and effectively?” 


Methods of Selecting the Airline Pilot 


Methods of selecting applicants for the job of airline pilot were studied 
by examining the personnel records of 432 pilots from five major airline 
companies. The technique employed was to obtain the records of pilots 
who had been released by their companies because of lack of flying pro- 
ficiency during the period between initial hiring and the time when they 


* Parts of this paper were read at the Meeting of the Aero-Medical Association in 


Toronto, Canada, on June 17, 1948, The author’s report of the entire study has been 


Published as Research Report No. 73 by the Civil Aeronautics Administration, Division 


a s bere ie Fite: sig pee of this committee and to John C. gry 
: ance during and to the members of the Aviation Branch of the 
American Institute for Research who assisted in conducting the study. 


122 


The Airline Pilo’s Job 123 


ve qualified as an airline captain. These pilots constituted the 
ental group (E-group). Then the records were obtained on a 
of pilots who had not been eliminated but were Ea em- 


pany. Adequate data were available for both the experimental and 
control groups on eight variables of the type currently established by 
i e companies as selection requirements for pilot applicants, These 
les were: (1) Age at time of hiring; (2) Previous education; (8) 
Test 1.Q. scores; (4) Bennett Test of Mechanical Comprehension 
‘orm AA) scores; (5) Minnesota Multiphasic Personality Inventory 
; (6) Previous flying hours; (7) Marital status; and (8) Previous 
ound training in aeronautical subjects. 
_ The experimental group and the control group were compared on 
of these variables. Data were not available for all of the pilots in 
group on each separate variable. The findings, summarized in 
le 1, show that the difference between the group of eliminated pilots 
oup) and the group of successful pilots (C-group) on no one of the 
t variables was statistically significant even at the 5% level of signifi- 
K cance, These results indicate rather conclusively that present require- 
Ments established by airline companies for selection of applicants are not 
\dequate for predicting later success or failure with much confidence. 
_ Furthermore, because none of the selection procedures differentiated 
‘ ween eliminated “and successful pilots it was not possible to derive 
et mit these procedures any clues as to the critical requirements of the 
pilot’s job. 


Methods of Evaluating the Airline Pilot 


The methods and procedures used by airlines for evaluating their 
lots also were surveyed in this study. Information pertaining to 
aluation procedures was obtained primarily through examination of 
mpany records of the flight performance and ground school achievement 
th eliminated and currently employed pilots and through scrutiny 
he flight examinations used by airlines. Pilots’ and check-pilots’ 
titudes toward present methods of evaluation and their suggestions for 
improvement were obtained through individual interviews. The findings 
‘Tegard to methods of evaluation currently used by airline companies 
be summarized as follows: 


1. There exists a great amount of variation between airline companies 
to the adequacy of the training records maintained on their pilots. 
here were practically no records of flight tests in the files of some of the 
‘ots, 


Thomas Gordon 


Table 1 


Comparison of Eliminated and Successful Airline Pilot Trainees on Selection 
Requirements Established by Airline Companies 


Number of 
Pilots Standard 
—— Mean Error 
E- (05 Difference of 
Belection Requirements group group (CminusE) Difference t-ratio* 
1. Age at Time of Employment 169 166 — .65 yrs, 50 1.297 
2, Amount of Education at Time 
of Employment Beyond High 
School 170 169 18 yrs. 2⁄4 738 
3. Otis 1.Q.’s 63 63 2.40 1,36 1.762 
4, Bennett Test of Mechanical 
Comprehension Scores 
(Form AA) 14 14 6.14 7.24 848 


5. Minnesota Multiphasic Per- 


sonality Inventory Scores: 
Hypochondriasis Scale 15 15 —1.50 1.49 222 
Aggression Scale 16 16 1,00 2.91 514 
Hysteria Scale 18 18 1,00 1.81 553 
Psychopath. Deviate Scale 15 15 3.47 3.07 1.130 
Interest Scale 17 17 =1.71 3.03 564 
Paranoia Scalo 17 17 1.29 1.65 780 
Psychasthenia Scale 15 15 .00 2.29 000 
Schizophrenia Scale 15 15 — 87 1.60 544 
ia Scale 16 16 2.81 1.44 1.957 
6. Number of Previous Flying 
165 171 14.7 hrs, 128.92 113 
7. Marital Status 170 168 (117 married in E-group, 119 in 
C-grou] 
8. Previous Training in Aero- 2; 
nautical Subjects 214 214 (134 with previous training in 


both E-group and C-group) 
* Nono of the mean differences were statistically significant at the 5% level of 


significance. 


2. All of the airlines rely upon periodic flight examinations, called 
flight-checks, for obtaining evaluations of their pilots. The maneuvers 


that make up the flight-check vary from one airline to another and vary 


somewhat within a single airline from one check-pilot to another. 
3. In general, on these flight-checks a pilot is rated against the 
standard set up by the particular check-pilot rather than against an 


objective standard. For example, pilots are usually rated on a scale, 
such as “Standard-Substandard,” “Good-Average-Below Average,” or 


The Airline Pilot's Job 125 


4-5.” Fora few maneuvers, however, some airlines have tried to 
blish more objective standards in the form of “limits” for altitude, 
d or heading, within which the examinee is required to keep the 
in order to achieve a passing rating. Even the application of these 
limits was found to vary among check-pilots within a single airline. 

© 4, The “halo effect” was found to be operating in the ratings of flight 
‘performance. From examination of the records of past flight-checks, it 
s found that when a pilot received a below-average rating on one 
maneuver there was a very strong tendency for him to get below-average 
ratings on all subsequent maneuvers. The ratings did not differentiate 
een the pilots’ strengths and weaknesses. In other words, they are 
useful as diagnostic performance records. One explanation of the 
of discrimination of the ratings on different maneuvers is the fact 
at it is common practice to make no record of the pilot’s performance 
ring the flight. The flight-test forms are usually filled out after the 
ght, thus increasing the chance that the quality of a pilot’s performance 
| specific maneuvers might be forgotten. 

The discovery of the inadequacy of records of performance, the lack 
standardized evaluation procedures and the subjectivity of the meas- 
of proficiency parallels the findings of Army Air Force psychologists 
Whose research in the area of pilot proficiency measures is reported in the 
volume edited by Miller (5). Similar findings have been reported 
study of the flight-checks used by the Civil Aeronautics Adminis- 
tion (1). 

The Critical Requirements of the Job 


A second objective of the study was to determine the critical require- 
ents of the job of airline pilot. This involved an analysis of the job 
“with particular emphasis upon isolating those job requirements which 
‘are the most critical. In this approach, “critical requirements” are 
defined as those job requirements, expressed in behavioral terms, which 
have proved to be important factors in differentiating successful or un- 
successful performance on the job. The assumption underlying this 

his that the most critical differences between the safe and effective 
and the one who is not will be revealed by focusing the job analysis 
upon situations where the behavior of pilots has been shown to make a 
erence. To use a common expression of pilots, the critical require- 
t approach attempts to determine “what separates the men from 
boys.” Flanagan (2) has described the use of this job analysis 
hod in the study of causes of mission failures in the Army Air Forces, 
he has stated that such a determination of critical requirements is 
principal objective of job analysis procedures. 


2 


126 Thomas Gordon 


Although this approach resembles other methods of job analysis, it 
also differs from them in an important respect. It is common practice 
in most job analysis approaches to collect long lists of job requirements, 
after which it is necessary to submit them to experts (usually psycholo- 
gists, supervisors, top management) for judgments of their relative im- 
portance for success on the job. The critical requirement approach, 
however, yields at the outset only the critical requirements, and it relies 
more upon the participants on the job for judgments of what is critical 
or upon actual records of situations where behavior has been critical. 
For example, in this study we relied upon the following sources of infor- 
mation: 


1, An analysis was made of the records of all scheduled domestic airline 
accidents, during the period of 1938 through 1946, in which the behavior 
of the pilot was judged a contributing factor in the accident. From each 
of 121 such accident reports we extracted a description of the specific 
behavior of the pilot prior to and during the accident and the circum- 
stances leading up to the accident. 

2. Interviews were conducted with airline pilots and check-pilots for 
the purpose of obtaining a larger sample of critical incidents than was 
‘provided by the accident reports. Questions were devised which would 
yield examples of critical situations rather than commonplace or every- 
day occurrences. This “critical incident technique” required the pilots 
to recall recent events or incidents in which they did something which 
created an unsafe situation, thus minimizing discussions of traits or 
stereotyped opinions as to the Tequirements of the job. Examples of 
“critical incident’? questions used are: 


. a. “Probably all pilots who have flown a lot have done something at one 
time or another that got them into an uncomfortable situation or Bien. a near- 
Keia We would like to get several examples of such things you have 

one. enk oud you describe the most recent situation in which you did 
ce ui and tell me just what you did?” 
teat ay I would like for you to recall the last time you had to take over 
i): e g xols from a co-pilot because you felt the situation was pretty critical. 

ould you describe that situation and tell me just what the co-pilot did or 
ee donei ee ae taken over?” 

. would like W on your experience as a check-pilot to get 

examples of what pilots do on check-rides, Would you think Drol dn the last 


sty bed fastest a check-ride and tell me exactly what he did which caused 


From questions such as these we obtained 333 usable incidents from 


270 interviews. Interviewing was done in 18 cities with pilots from 27 


different scheduled and non-scheduled airline companies. The pilots 


were selected in a fairly random manner. The determining factor for 
selection generally was the presence of the pilots at the airport in prepara- 


: 


The Airline Pilot's Job 127 


ion for a flight or at the completion of a flight on the particular days 
I interviewers visited the airport. The questions were standard for 
‘each interview and the interviewers wrote down the responses of the 
ots on standard forms. An example of the kind of incidents ob- 
ed is the following: 
“On daytime flight from New York to Miami in DC-3, the weather was 
“clear with wind gusts up to 50 mph. They were landing at Raleigh with 
roximately 35 mph wind about 45° across runway. The co-pilot was 
‘landing the plane from the left seat. He came in too slow and was just about 
to touch the runway going sideways and with the downwind wing dangerously 
low. The captain was afraid the co-pilot might land hard enough going 
sideways to buckle the landing gear or get the wing low enough to cause a 


“ground loop. The captain took the controls, added power, corrected for drift 
d landed OK. The captain stated that his co-pilot was inexperienced.” 


The next step in the analysis involved extracting from each incident 
the specific pilot acts contributing to the accident or near-accident. For 
xample, in the incident above, the following critical pilot acts were 
" extracted: (1) Executed landing approach at too low an airspeed and 
; (2) Drifted or was not aligned with runway during round-out. The 
above step yielded 787 specific pilot acts, each of which had contributed 
to an accident or an unsafe situation. This group of acts was subjected 
to a further analysis in which all the acts were sorted into 21 smaller 
groups or clusters of homogeneous acts. These clusters made up or 
_ defined the critical components of the job of airline pilot. For example, 
it was found that there were four categories of errors all having to do 
With the operation of controls and switches: 41 instances in which 
Pilots forgot to operate a control or switch, 31 of confusing two controls 
_ or switches, 14 of improperly adjusting a control or moving a switch in 
_ the wrong direction, and 6 of inadvertent operation of a control or switch. 
_ These four different kinds of errors of operating controls and switches 
formed a cluster which defined a specific job component. Twenty-one 
" such components were extracted from the data. Ji 
As would be expected, it was found that certain components ranked 
higher than others as judged by the frequency of the specific pilot errors 
classified in the particular components. 
The components of the airline pilot’s job found most critical are 
shown in Table 2. In column (a) are listed the 21 critical requirements 
or components of the job which were obtained by classifying or grouping 
= Similar pilot errors extracted from three sources: (1) from analysis of air- 
Fi line accidents, (2) from analysis of incidents or near-accidents experienced 
4 


| 
i 
] 
f 
; 


_ by airline pilots, and (3) from analysis of pilot errors reported by opa oe 
pilots as reasons for failing pilots or for taking over controls from pilots 
y "on check-rides or flight examinations. The frequencies with which errors 


128 Thomas Gordon 


Table 2 


Critical Requirements of the Job of Airline Pilot Determined by Frequency of Errors 
Extracted from Accident Reports, Critical Incidents and Flight-Checks 


Frequency’ of Errors 
o © @ ©) 
Critical Requirements Acci- Inci- Flight- 
Re dents dents checks Total 


1. Establishing and maintaining angle of glide, 
rate of descent, and gliding speed on ap- 


proach to landing 47 41 11 99 
2. Operating controls and switches 15 44 33 92 
3. Navigating and orienting 4 39 19 62 
4. Maintaining safe airspeed and attitude, re- 
covering from stalls and spins ll 28 18 57 
5. Following instrument flight procedures and 
observing instrument flight regulations 5 27 13 45 
6. Carrying out cockpit procedures and routines 7 81 4 42 
7. Establishing and maintaining alignment with 
runway on approach or takeoff climb 3 81 5 39 
8. Attending, remaining alert, maintaining look- 
out 14 23 1 38 
9. Utilizing and applying essential pilot infor- 
mation 0 19 18 37 
10. Reading, checking and observing instruments, 
dials and gauges 1 26 7 34 
11. Preparing and planning of flight 2 27 3 32 
12, Judging type of landing or recovering from ` 
missed or poor landing 1 23 8 382 
13. Breaking angle of glide on landing 1 25 5 31 
14. Obtaining and utilizing instructions and in- 
formation from control personnel 3 21 0 24 
15. Reacting in an organized manner to unusual 
or emergency situations 0 17 7 24 
2 On t plai safely on ground 7 15 1 23 
+ “ying with precision and accuracy 0 7 15 22 
18. Operating and attending to radio 0 7 10 17 
19. Handling of controls smoothly and with co- 
ordination 0 6 8 14 
20. Preventing plane from undue stress 0 5 7 12 
21. Taking safety precautions 2 5 4 11 


a e a 


« 


were obtained from each of these three sources are shown in columns (b), 
(©) , and (d). The total frequencies of errors from all sources are shown 
in column (e). 

Table 3 presents the correlations between the rank order of the critical 
requirements as determined from the frequencies of pilot errors obtained 


yi 


Dy 


The Airline Pilot’s Job 129 


from the three sources. These were computed in order to answer the 
į question: “To what extent do you obtain similar indices of the relative 
- ‘eritical-ness’ of the various job requirements from analyses of critical 
behavior in accidents, in incidents or near-accidents and on flight-checks.” 


Table 3 


Correlations Between Rank Order of Critical Requirements as Determined 
from Three Sources of Pilot Behavior 
(Spearman Rho Coefficients) 


Incidents Flight-checks 
‘Accidents 71 GE. = .12)* —.04 (S.E. = .23) 
Incidents 128 (S.E. = .22) 


* An r of .43 is necessary for significance at the 5% level and an r of .55 for signifi- 
cance at the 1% level (4). 


The high positive correlation between the rank order of the critical 
requirements determined from analysis of airline accidents and from 
analysis of the incidents reported by pilots indicates that with the critical 
incident technique we accomplished the objective of obtaining job require- 
ments which are critical from the standpoint of safe flying. The low 
negative correlation between the rank order of the critical requirements 
as determined by analysis of accidents and as determined by analysis of 
behavior of pilots on flight-checks might be interpreted as an indication 
that check-pilots’ reasons for failing pilots on flight-checks are not closely 
related to the requirements of the job which seem to make a difference 
between safe and unsafe airline flying. It may well be that the present 
flight-checks do not provide an adequate evaluation of the extent to 
which pilots demonstrate proficiency in the most critical aspects of airline 
flying. Check-pilots seem to be emphasizing proficiency in different 
aspects of the job, such as flying the plane smoothly and keeping within 
very precise limits of altitude, airspeed and heading. These require- 
ments, of course, are probably important from the standpoint of the com- 
fort of passengers, but at the same time they shouldn’t be emphasized at 
the expense of neglecting requirements which are more critical from the 
standpoint of safety. 


Summary 


The lack of statistically reliable differences between eliminated and 
successful pilots indicates that present methods of selection do not predict 
success or failure in training. To achieve this, it is probably necessary 
for airlines to utilize new procedures which have been validated against 


130 ; Thomas Gordon 


this or some similar criterion, rather than rely on standardized tests 
and interview procedures which, although of possible usefulness for 
predicting success in other fields, have not been validated as predictors 
of success in airline piloting. Furthermore, it would appear from this 
survey that training and proficiency records of airline companies are 
inadequate for use as criteria of proficiency or for providing a diagnostic 
picture of the proficiencies of their pilots. The findings also suggest 
that the subjective type of flight-check currently employed by airlines 
cannot adequately provide an objective evaluation of the extent to which 
pilots meet the most critical requirements of the job. 

From the results of the analysis of pilot errors extracted from various 
critical incidents it has been shown which components of the pilot’s job 
are most critical from the standpoint of safety and effectiveness on the job. 
The implications of these results are rather obvious, but some may be 
mentioned briefly: 


1. The most critical requirements should receive more emphasis by 
check-pilots who evaluate the proficiency of pilots on flight examinations. 
This study suggests that special emphasis is needed on the landing 
approach, accurate operation of controls, methods of navigating and 
orienting, maintaining safe airspeeds, compensating for drift. These 
data are now being used as a basis for developing a more objective flight 
examination, as mentioned earlier. 

2. The findings suggest a need for improved cockpit design to simplify 
the pilot’s job and reduce the possibility of such errors as: confusing two 
controls, making improper adjustments of controls and inadvertently 
operating controls. 

8. The findings might be used to suggest which components of the job 
need greater emphasis in the training program for pilots. Pilot trainees 
a could be informed which errors are most frequently made in airline 

ying. 

4. The list of critical requirements should prove useful in devising im- 
proved methods of selecting pilots, inasmuch as they provide valuable 
clues as to the critical aptitudes needed by a safe airline pilot. 


Really, it would seem that the results of this study furnish evidence 
that the critical incident technique” is a very useful method of isolating 
ie ett requirements of a particular job. This method increased 
the size of the sample of critical incidents, which if restricted to accidents 
ao would have been too small to yield a sufficient number of pilot errors 
upon which to base the list of critical requirements. 


Received August 22, 1948. 


The Airline Pilot's Job 131 


References 


L., Kogan, L. 8., Odbert, H. S., and Wapner, 8. An analysis of inspectors’ 
ngs of check-flights as recorded on ACA 342Z. Washington: CAA Division of 
, Report No. 58, March 1946. 

Johni C. (Ed.). The aviation psychology program in the Army Air Forces. 


ry Brozem Research Report No. 1.) 

T. The airline pilot: survey of the critical retir enantd Of A4 J00 and of Pilot 
evaluation and selection procedures. Washington: CAA Division of Research, 
No. 73, November 1947. 

J. P. Fundamental statistics in psychology and education. New York: 
w-Hill, 1942. 

. (Ed.). Psychological research on pilot training. Washington: U. 8. 
Government Printing Office, 1947. (AAF Aviation Psychology Program Re- 
aaa Report No. 8.) 


Factors Related to Life Insurance Selling * 


D. F. Kahn and J. M. Hadley 
Division of Applied Psychology, Purdue University 


The purpose of the study has been three-fold: first, to determine the 
degree of relationship that exists between relative success in the early 
period of selling life insurance and success at a later period; second, to 
examine various selling activities with a view to uncovering certain 
factors which differentiate successful from unsuccessful agents, and to 
select such factors as might contribute to the refinement of life-insurance 
training programs; third, to investigate further certain personal history 
items and personality traits already known to correlate with success in 
selling life insurance, and to analyze other measurable areas of personality, 
with the aim of increasing the sensitivity of existing selection methods. 
The identification of individuals for whom the likelihood of success is 
known would not only benefit management, but would, to some extent, 
minimize feelings of frustration on the part of the agent who, from the 
outset, may be doomed to failure. 


Procedure 


The subjects considered in the present investigation were a group of 
84 new life insurance agents who had attended Class I and Class II of the 
Purdue Course in Life Insurance Marketing (1). Each subject selected 
for study had received a five-week basic course at the school and had 
also completed thirteen weeks of selling in the field. Production records 
for the most part were collected during the year 1946.1 The salesmen 
represented 19 life insurance companies and 22 states (3). Well over 


r REIRA condensation of the senior author’s dissertation of the same title 
and completed under the direction of the junior author. The dissertation was sub- 
papi’ papain £ P) ue University in partial fulfilment of the requirements 

s of Philoso : à Uni- 
versity Libraries. phy, August 1948 and is on file in the Purdue 

a si anne Course is a one-year plan divided into 15 weeks of classroom training 
and approximately 37 weeks of supervised field work. The 15-week in-residence 
training is divided into three five-week sessions. The first session takes place before 
ane actual g is done; the other two interrupt the 37-week selling period at intervals 
ol eee PoR 12 to 16 weeks each. Weekly records are forwarded to the school by 
tho dire ERE ger during the field period. We are deeply indebted to 
. ah : pad eyes ne in Life Insurance Marketing, Mr. D. P. Cahill, for 
e A Cae i ilabl hich 

this study is based. assistance and for making available the data upon w! 
132 


Factors Related to Life Insurance Selling 133 


alf of the group were veterans attending the school under public 
affording veterans educational advantages. 
Data were collected in three major areas: selling activities, personal 
ory items, and psychological measures. 
Selling activities included: (1) size of application written, (2) number 
of calls made. A call is defined as a face-to-face conversation with a 
tial buyer. (3) Number asked to buy. ‘Asked to buy” is defined 
$a salesman’s discussing life insurance with a client where the latter - 
j asked to take action toward securing a policy, and (4) number of 
plications written. An application is defined as a signed application 
fe life insurance policy. 
Relationships between calls, “asked to buy,” and applications written 
e investigated under the following headings: (1) percentage of applica- 
s written to total number of calls made, (2) percentage of persons 
to buy to total number of calls made, (3) percentage of applica- 
is written to total number of persons “asked to buy.” Because of 
e unequal lengths of reported field work for Class I and Class II, 37 and 
weeks respectively, and further because some of the agents, for one 
Teagon or another, withdrew from the course before completion, the 
measures, calls, “asked to buy,” number of applications written, and 
Production were computed on the basis of a weekly average. 
__ During the initial five-week training period a personal history ques- 
tionnaire and a battery of psychological tests were administered. Per- 
N sonal history items analyzed were: (1) age, (2) number of dependents, (3) 
ny ving expense per month, and (4) life insurance owned, including National 
| -Service Life Insurance. 
| 


} | 


__ Psychological measures investigated included: (1) Kuder Preference 
_ Record (5), (2) Guilford-Martin Personnel Inventory Number I (2), (8) 
the previously mentioned test, used to measure degree of uncertainty as 

> determined from the number of questions responded to as undecided, 
“9, (4) The Adaptability Test (8), (5) Part Il? of the Aptitude Index (7). 

| _ The measure adopted in the present study as the criterion of success 
‘was the production records on file with the Purdue insurance school; 
these records were based on the total of signed applications, that is, 
written business, and not the total of business signed, examined and paid 


| Tetter grades reported on the Aptitude Index refer to Part II only, and should not 
be confused with the letter grades usually cited for the entire instrument, and which 

‘ate derived from an age-weighted combination of Part I and Part II of the Index. 

A special form, Part TIT, that is to say, the Personal History portion of the Aptitude 

Index which was devised for use with former service men, was administered but not 
ea into this study because of the inaccuracy and the incompleteness of response 
| eee Three of the items appearing on Part I of the Index, number of dependents, 
_ Aving expenses per month, and amount of life insurance owned, however, have been 


eae separately in this study. 


134 D. F. Kahn and J. M. Hadley 


for, which is generally referred to as paid-for business. Varying lengths 
of school attendance necessitated reducing total production to average 
weekly production in order to evaluate the relative success of each agent, 

In an endeavor to ascertain the relationships between early pro- 
duction and later production, two correlations were computed. The 
first of these considered the relationship between the average weekly 
production for the first 13 weeks and the average weekly production for 
the time spent in the field over and above those 13 weeks. Two agents 
who failed to report to the school after the thirteenth week were elimi- 
nated from this correlation, and thus 82 agents were left who had com- 
pleted from 15 weeks to the maximum school period of 39 weeks in the 
field. Approximately 69 per cent of this group had completed the course. 
The second correlation computed was a measure of the same type of 
relationship; however, this time only those salesmen who had reported a 
minimum of 26 weeks were considered. It was felt that whatever rela- 
tionship would be found to exist in the latter correlation would be a more 
accurate reflection of differences between early and late selling, since the 
first 13 weeks would be compared with a selling period of equal or greater 
length. Sixty-five cases meeting such a requirement were found, approxi- 
mately 87 per cent of whom had completed the course. 

Tn order to fulfill the second and third purposes of this study, that is, 
to determine whether or not the measures selected would differentiate 
successful from unsuccessful salesmen, two such contrasting groups of 
agents were identified. In making such a distinction, the agents were, in 
the first place, ranked from high to low on their total sales while attending 
the school. The production records covering the duration of the school 
term for those agents who, for one reason or another, did not complete 
the COUTEe; but did continue to sell insurance, were obtained from the 
respective agency managers under whom the subjects were selling. 
These records were used as checks against the agents’ weekly reports of 
production made to the school, and were thus used to substantiate the 
original rank order of the agents in the study. Six of the group of 84 
salesmen withdrew from the school at an early date after the first 13- 
week period, and hence their records were incomplete. These agents had 
either terminated with their companies, or had continued as life insurance 
salesmen, but for various reasons further records on their selling activities 
were not made available, It was felt that the inclusion of these men in 
the analysis would, to some extent, invalidate the rather stable ranking 
of the remaining 78 agents. For these reasons six agents were omitted 
from this part of the study. The ranked average weekly production 
records were divided into three equal groups, numbering 26 agents each. 
The high and the low groups (average weekly production $8,602 and 
$2,181 respectively) were designed as successful and unsuccessful. 


y 


Factors Related to Life Insurance Selling 135 


in an attempt to locate new test items which might be of value in 
selection devices an item analysis was undertaken on the 150 
appearing on the Guilford-Martin Personnel Inventory Num- 
The D-value method based on Lawshe’s nomograph (6) adapted 
‘the Kelley technique (4) was employed in this part of the procedure. 
ms responded to by the agents as uncertain, “?”, were grouped 
| the “No” responses. 
i Results 


correlation of +.61, based on the records of 82 agents, was obtained 
een average weekly production for the first 13 weeks and average 
production for a period of from two to 26 weeks beyond the initial 
|. For the group of 65 agents who had completed at least a second 
selling period, approximately 87 per cent of whom had reported 
for the entire 37 or 39 week course, a correlation between the 
e weekly production for the first 13 weeks and the average weekly 
oduction for at least a second 13 weeks was found to be +.55. Both 
of the above-mentioned correlations are significant beyond the one per 
ent level of confidence. 
__ Analysis of the selling activities measured revealed that several signifi- 
Cant differences existed between the groups of successful and unsuccessful 
s salesmen, The successful salesmen were higher in every comparison 
except the percentage of prospects asked to buy. They asked more 
people to buy but this can be explained by the fact that they averaged 
more calls per week. Although both groups of agents asked approxi- 
“mately 37 persons to buy insurance out of each 100 calls made, the 

Successful salesmen sold insurance to approximately 31 per cent of such 
‘Prospects as contrasted with the unsuccessful salesmen who sold to 
approximately 17 per cent. 

Although success in insurance selling is, in part, determined by the 
denomination of the applications written by a salesman, other factors 
‘Are also important. While it is true that the average size of policy 
Written by the high and low groups of this study is different, the difference 
between these averages only partially accounts for the difference between 
the two groups with respect to the average weekly production figures. 

revealed that the successful salesmen were actually able to sell 
to a significantly larger percentage of persons called upon. In view of 
the average number of applications written per week by both groups, the 
‘Successful group of salesmen would have been able to produce over twice 

48 much insurance written as the unsuccessful group, even if the size of 
the application written had been exactly the same for both groups. The 
successful group was therefore able, in terms of the number of persons 
to whom they sold alone, to do more selling than the unsuccessful group. 


a 


5 


136 D. F. Kahn and J. M. Hadley 
Table 1 


Comparisons of Mean Differences Between High and Low Producing Groups of Agents PA; 


in Various Selling Activities and Personal History Items 


men No. of era No. of 


ol ol Cases S.E. 
High Low Low Mean of 
Item Group Group Group Group Diff. Diff. C.R. 
A AS a l O A i oT i to 
Average Weekly Pro- 
duction (in dollars) 8,602 26 2181 26 6,421 546 11.76 
Size of Application 
(in dollars) 5,958 26 3,538 26 2,419 784 3.08 
Average No. of Appli- 
cations per Week 1.66 26 .74 26 93 14 6.70 
Average No. of Cases 
per Week 18.68 26 13.77 26 4.91 1.65 2.98 
Average No, Asked- 
to-Buy per Week 6.86 26 5.23 26 1.63 1.02 1.60 
% of Applications to 
Total No. of Cases 9.66 26 5.81 26 3.84 1.00 3.84 
% Asked-to-Buy to 
Total No. of Cases 37.50 26 37.82 26 —.31 4.70 07 


% of Applications to 


Total No. Asked-to- 

Buy 3087 26 1716 26 13.71 3.87 3.54 
Age* 30.27 26 28.35 2 1.92 1.70 1.13 
Dependents* 146 26 115 =. 31 .30 1,02 
Monthly Living Ex- 

penses (in dollars)* 209 1.11 
Pech ie 26 187 23 22 19 

(in dollars)* 16,544 25 9158 24 7,386 1,587 4.65 


* At entry into life insurance business, 


Differences between the two groups in question with respect to per- 
sonal ry items.revealed, as may be seen in Table 1, that of the four 
items analyzed only one, namely the amount of life insurance owned at 
the time of entry into selling, resulted in a critical ratio significant beyond 
the one per cent level of confidence. Nevertheless, the average agent in 
s found to be older, to have a greater number of 
lependents, and to have a higher standard of living as determined by 
living expenses per month. It is believed that investigation into per- 
sonal history items that appear among various selection devices would 
reveal that the age factor is closely related to several other items com- 
monly employed in typical questionnaires such as number of dependents, 
amount of insurance owned and so forth. 


Table 2 shows differences between mean scores for the high and the 


Factors Related to Life Insurance Selling 137 


ing groups of agents for each of the nine areas dealt with by the 
ce test, and gives further the critical ratios for the signifi- 
differences between the means obtained in these areas. The 
tical ratio, 2.81, was obtained in the area entitled ‘“Clerical.” 
successful salesmen scored significantly lower in this area 
the unsuccessful. The critical ratio for the persuasive com- 
s found to be only 1.92, thus significant at approximately the 
Beant level of confidence. 

enced by the critical ratios appearing in Table 2 no differences 
beyond the ten per cent level of confidence were found to exist 
the mean scores for -the low-producing salesmen for traits 
by the Guilford-Martin Personnel Inventory Number I. 

iti tical ratio of 1.57, as shown in Table 2, resulted from a testing of 
i ce eg the difference between the average number of question- 
s appearing between the high and low-producing groups. 


Table 2 


rison of Mean Differences Between High and Low Producing Groups of 
Agents.in Various Psychological Measures 


rence High High Low Low Mean 
Group Group Group Group Dif. 


63.73 26 57.87 23 5.86 5.42 1.08 
29.88 26 32.87 23 —2.99 2,84 1.05 
47.38 26 45.52 23 1.86 3.57 52 
112.12 26 104.22 23 7.90 4.12 1.92 
41.46 26 37.17 23 4,29 3.60 1.19 
48.31 26 53.70 23 —5.39 3.74 1,44 
21.96 26 25.65 23 —3.70 2.30. 1.61 
78.04 26 76.43 23 1.60 4.29 37 
47.96 26 56.09 23 —8.13 2.89 2.81 
n I) 
49.79 24 54.67 24 —4.88 2.98 1.64 
30.63 24 31.96 24 —1.33 2.45 54 
68.17 24 68.00 24 17 3.92 04 
8.42 24 12.83 24 —4,42 2.81 1.57 
21.32 25 22.08 25 — 76 1.60 AT 
Test) 
46.08 26 42.28 25 3.80 2.34 1.62 


ined by the “?” count on the Guilford-Martin Personnel Inventory 


138 f D. F. Kahn and J. M. Hadley 


It was noted that two men in the low-producing group answered at least 
forty out of a total of 150 questions in the Inventory as undecided, which 
is an unusually large number in terms of the general distribution on the 
Inventory. However, what may prove to be an important finding is 
the fact that, when the entire group of agents is considered, the three 
men who received a score of 40 or over in question-mark responses, 
produced an average mean weekly production figure of $2,689 as con- 
trasted with the similar average of $5,281 for the 66 agents who scored 
at 31 and below. There were, moreover, no scores between 30 and 40 
for the whole group of agents in question. Although the number of 
agents in this study who scored unusually high on the measure designated 
as undecided is too small to allow one to place much confidence in it as an 
absolute finding, it is believed that further investigation along these lines 
with larger samples might prove fruitful. 

Even though no significant differences were found to exist between 
the high and the low groups of salesmen when measured by the scores 
derived from the Guilford-Martin Personnel Inventory, an item analysis 
was undertaken in the hope of uncovering certain items that might 
possibly discriminate between the two groups of salesmen. The eight 
items producing the highest D-Values were: 42, 48, 77, 83, 99, 103, 135, 
139. To all of these items, with the exception of item 83, the successful 
group of salesmen responded with a higher percentage of “Yes” answers 
than did the unsuccessful group of salesmen. Only four of these items 
were found to be significant beyond the five per cent level of confidence. 
These were items 42, 77, 135 and 139, having respective critical ratios of 
2.73, 2.14, 2.11, and 2.82, reflecting the significance of the differences be- 
tween the percentages of “Yes” responses to each item for the high and 
low producing groups of salesmen. However, four such items, significant 
at the five per cent level, would be expected to occur in a test of 150 items 
by chance alone. Still it is quite likely that one or more of these items 
might well continue to be discriminating and reliable items. Further 
investigation would probably shed more light on this question. 

No appreciable difference, as is evidenced by Table 2, was found to 
exist between the average mental ability of the successful and unsuccessful 
groups of salesmen. A slight relationship exists between the combination 
of personality characteristics measured by Part II of the Aptitude Index 
and life insurance production, 


Summary 


Based solely on the criterion of written business, and pertaining only 
to those particular life insurance salesmen investigated in this study, the 
following conclusions may be drawn. 


Factors Related to Life Insurance Selling 139 


degree of success during approximately the first three months 
‘significantly better than chance basis for predicting the degree of 
n the life insurance selling at a later date. The correlation be- 


more weeks is +.55. : 
Significant differences in favor of the successful agents were found 
between the two criterion groups with respect to the following 


. Average number of calls per week. 

2. Number of applications written per 100 persons “Asked to buy.” 
3. Number of applications written per 100 persons called upon. 
Average size of application. 

. Average number of applications written per week. 


on-significant differences in favor of the successful agents were 
exist between the two criterion groups with respect to the number 
ms “asked to buy” insurance per week. Since the number of 
called upon was significantly higher for the successful groups the 
e of persons “asked to buy” per 100 called upon was almost 
for the two groups of salesmen. 
Of the four personal history items investigated, only one, namely, 
nt of insurance owned at entry, was found to differentiate signifi- 
“beyond the one per cent confidence level between successful and 
ul life insurance salesmen. The other three items, age at 
, number of dependents, and minimum living expenses per month, 
positive relationships to the criterion although no significant 
between the two groups in question was found to exist for 
measures. 
‘The findings of the present study indicate that the Kuder Pref- 
Record, as commonly used, may identify life insurance salsemen 
s not differentiate successful from unsuccessful agents. However, 
lysis of the present data indicates that there are inherent in the 
certain relationships with success in selling life insurance that - 
e to be useful in selecting high producing salesmen. 
No significant differences between the two criterion groups were 
for any of the three component measures of The Guilford- 
Personnel Inventory. A supplementary measure, degree of un- 
, as determined from the number of question-mark responses, 
y showed no significant difference to exist. One unusual finding, 
, deserves mention: the three men in groups whose degree of- 
ty score was abnormally high were identified as producing very 
‘the mean of the total group. While this number is too small to 


140 D. F. Kahn and J. M. Hadley 


permit generalization, it is suggested that such a score may well warrant 
further investigation. 

7. An item analysis of the 150 items of the Guilford-Martin Inventory 
revealed only four items which distinguished between the criterion groups 
significantly beyond the five per cent level of confidence. The result 
reflected by these four items may be considered to be well within chance 
expectation for a test of the present length. Nevertheless, further in- 
vestigation may possibly prove one or more of these items to be service- 
able enough to warrant their inclusion in a selective device. Although 
not a finding of the present study, it is believed possible that existing 
personality tests when carefully analyzed may reveal behavior patterns 
common to successful life insurance agents. It is also believed that 
unstructured or projective tests may prove of value by tapping those 
personality characteristics not capable of being indentified by the usual 
structured test. 

8. No significant difference was found to exist between the mental 
ability test scores of the successful and the unsuccessful salesmen as 
measured by this tool; the mean scores of both criterion groups was for 
all practical purposes the same on The Adaptability Test. 

9. Although no significant difference was obtained between the mean 
Taw scores of the two groups in question, trends present in the data 


indicate that Part II of the Aptitude Index may have some predictive 
value. 


Received August 19, 1948. 


References 


1. Barnes, D. F. The Purdue course in life insurance marketing. New York: The 
National Association of Life Underwriters, 1946, pp. 24. 
2. Guilford, J : P., and Martin, H. G. @uilford-Martin personnel inventory, manual of 
directions and norms. Beverly Hills, Calif.: Sheridan Supply Co., 1943, pp- 2. 
8. Kahn, D. F. An analysis of life insurance salesmen. Unpublished master’s thesis, 
i Purdue University Libraries, West Lafayette, Indiana, 1946. 
» Kelley, T. L. Selection of upper and lower groups for the validation of test items. 
eat ag Pa 1939, 30, 17-24, 
9. Audet, F. G. Intermediate manual for the Kuder preference record. Chicago: Science 
X eee Associates, 1944, pp. 16, r 
. , C. H., Jr. A nomograph for estimating the validity of test items. J. appl- 
i p p 1942, 26, i » ing validity o! H on 
7. ae ear Agency Management Association. The value and use of the Aptitude 
as A Hartford, Conn.: Life Insurance Agency Management Assoc., 1946, 
8. Tiffin, J., and Lawshe, C.H. Prelimi ptabili icago: 
vis , C.H. t. Chicagi 
Science Research Associates, 1943, pp” Taaa re ‘ea 


A Window-Stencil Method for Scoring the Strong 
Vocational Interest Blank (Men) 


J. E. Greene, R. T. Osborne, and Wilma B. Sanders 
The University of Georgia 


| psychologists and guidance workers as being one of the most 
ments for determining the vocational interests of male 
The nature of the standardization of the Strong Blank is 
n our belief, as to give it a higher degree of specific validity for 
jounselees than that which may be obtained from other tests of 
mal interest. On the other hand, many circumstances conspire 
t as frequent and effective use of the Strong Blank as its basic 
‘would seem to warrant. In many counseling situations, the 
r may wish to secure immediately Strong scores on selected 
ons for one or a relatively few clients. Under these circum- 
local machine scoring of the test is inadvisable. Moreover, if 
selor must send the answer sheet to some off-campus test scoring 
there will be an unwanted and often crucial delay in obtaining 
ults, When for either of these reasons machine scoring be- 
dvisable, the counselor must at present resort to the use of 
e, time-consuming and error-ridden process of hand scoring 
by means of the Strong ladder stencils, or of choosing some 
ly-scorable alternate test of vocational interest which often is less 
df particular purpose than the Strong test would be. 
he background for the development of the simplified scheme of 
oring the Strong Blank herein presented may be briefly stated. 
the senior author, while serving temporarily as Director of the 
Guidance Center of the University of Georgia, became im- 
with the local need for a simplified procedure for hand scoring 
Blank. A large proportion of our case load consisted of male 


K 


vas obvious that the Strong Vocational Interest Blank would 
valid and useful measures of vocational interest than 

other instrument commercially available. Consequently, 

thor set for himself the task of devising an accurate and quick ` 


141 


142 J. E. Greene, R. T. Osborne, and W. B. Sanders 


procedure for hand scoring the Strong Blank.! The basic procedure 
consisted of the development of four window stencils to which were 
transferred the positive and negative weights assigned to each of the 
400 items of the Blank, for each Strong occupational category separately, 

Since its introduction, this window stencil scoring system has been 
used locally on more than 5000 cases. Our data indicate that a semi- 
skilled psychometrist can score the Strong Blank at the rate of 24 
minutes per occupation. In our own set-up, as well as in many similar 
counseling situations, the counselor typically will not need to have the 
Strong scored on all possible keys. Our experience indicates that for a 
particular client we seldom wish scores on more than six of the occupa- 
tional categories. Consequently, the total amount of scoring time for 
the typical client seldom exceeds fifteen minutes. Where large numbers 
of papers are to be scored on all possible keys, one of the Standard IBM 
methods or the Hankes system is more economical. In addition to 
offering less opportunity for addition errors, and other errors due to faulty 
alignment of the scoring stencils, the procedure herein described has the 
advantage of being more economical of time and money. For example, 
the Strong ladder scoring system presupposes that a Strong booklet (8¢ 
each) will be expended for each client, whereas under our system IBM 
answer sheets (IBM Form ITS 1100 B 360 Rev—@ 2.35¢ each) are used 
and the booklet is not expended. In terms of clerical time involved, our 
window scoring stencil requires only approximately one-fourth as much 
time per occupation as does the Strong ladder scoring system. 


‘Our earliest scheme for using window stencils was devised by the senior author. 
The junior authors have subsequently refined and further simplified our earliest pro- 
cedures, For example, our original procedure required 16 separate window stencils for 
each of the Strong keys, as follows: 


(a) 4 stencils for the positive weights on page 1 of the IBM Answer Sheet, a separate 
stencil. for weight +1, +2, +3, and +4. 

(b) 4 stencils for the positive weights on page 2 of the IBM Answer Sheet, a separate 
stencil for weight +1, +2, +3, and +4. 

(©) 4 stencils for the negative weights on page 1 of the IBM Answer Sheet, a separate 
stencil for weight —1, —2, —3, and —4, 

(d) 4 stencils for the negative weights on page 2 of the IBM Answer Sheet, a separate 
stencil for weight —1, —2, -3, and —4, 

As contrasted with our earlier procedure which em] indow 
wiser ployed the 16 separate win 
porns indicated above, | the present system employs only 4 window stencils. The 

babes stencils indicated under (a), (b), (e), and (d) above have each been con- 
i aor; a single stencil. As is indicated in Figure 1, all of the positive weights 
for page 1 of the answer sheet are shown on the same window stencil. Weights of +2 


+3, and +4 are indicated to the right of iy t i ining 
windows for this stencil have a ight of the respective windows; all the remainini 


weight of +1, but ience indicates that it i# 
preferable not to show the weight of +1 wis right of anna 


—e 


The Strong Vocational Interest Blank (Men) 143 


Development of the Window Stencil System 


oposed to describe our procedure in some detail so that persons 
sh to do so may prepare their own window stencils for as many 


As was implied above, our basic procedure consisted of 
ing from the Strong ladder stencils for a given occupational 
y (e.g., Chemist) to our own window stencils for that same category 
positive (+1, +2, +3, and +4) weights and negative (—1, 
3, and —4) weights assigned to any given response to each of the 
in the Strong Blank. This process of transferring weights, 
the several steps indicated below, will be illustrated with the 
Chemist. 
1. Making use of page 1 (items 1-200) of the IBM Answer Sheet 
Vocational Interest Blank for Men (Revised) Form M? for 
the value of the various weights involved, we transferred from 
mg ladder stencil for Chemist all the +1, +2, +3, and +4 
ts which Strong assigned to these 200 items on the Chemist scale.’ 
e same manner, all the positive weights assigned to items 201-400 
corded on page 2 of a second Strong Answer Sheet. Thus these 
ate recordings carried all the positive weights assigned to 
ist in the 400 items of the Strong Blank. Similarly, all the —1, 
—3, and —4 weights assigned to items 1-200 were recorded on page 
third answer sheet and the negative weights assigned to items 
00 were recorded on page 2 of a fourth answer sheet. This procedure 
ed, therefore, in transferring from 9 ladder stencils (each having 3 
e columns of weights of varying size and sign) to 4 answer sheets 
positive and negative weights assigned to Chemist on the 400 items 
g the Strong test. 
2. The final step involved in preparing the 4 window stencils to 
the 9 ladder stencils for the Chemist scale required little time 
terial. In punching all of our window stencils we used the 
heavy cardboard form, International Test Scoring Machine 


Form ITS 1100 B 360 Rev. Copyrighted by the Board of Trustees of Leland 
Junior University. i 
stice, this procedure of transferring weights will be facilitated if two persons 
in the following manner: One person will apply the ladder stencil for Chemist, 
ng Interest Blank booklet and read off the +1, +2, +3, and +4 weights 
each response to items 1-200. For example, “Item 1—no weight; Item 2— 
item 3—1, +1; Item 4—no weight; Item 5—no weight; Item 6—L, +2; Item 
1; Item 8—D, +2; Item 9—no weight; Item 10—L, +4; etc.” The second 
ord these positive weights in the appropriate spaces on page 1 of the answer 
appropriate weights for Stencils B, C, and D will be determined similarly. 


144 J. E. Greene, R. T. Osborne, and W. B. Sanders 


Key Form A.‘ Each of the 4 answer sheets described above was used 
as a basis for punching a window stencil to which the weights recorded 
on the answer sheets were accurately assigned. For example, the 
answer sheet which recorded the +1, +2, +3, and +4 values on items 
1-200 was fitted exactly against the back® of one of the cardboard forms 
1000 A 310 and appropriate response positions were punched through 
both the answer sheet and the cardboard form witha pin. Then, working 
from the front (i.e., printed) side of Form 1000 A 310, each circle® through 
which the pin had been punched was converted into a “window” with an 
IBM hand punch. The weight of each response to each item was indi- 
cated according to the procedure described in footnote number 1 and 
illustrated in Figure 1. 


e e 
ee o @: 
pe 3 
o e e 
eo o © 
ee @ ee 
e e e 
e e e 
e e 
(d eo e 
e e 
e @- 
S e o e d 
D e e 


Fic. 1. Man chemist—stencil A: plus weights, items 1-200. 
(For page 1 of Answer Sheet.) 


„ Note: If sufficient demand should develop for ti 3 x ment 
will probably be made with Stanford a SA ie aioe coe arrange 


; Me: for each occupational scale, 4 window stencils were prepared, a8 
ollows: 


‘IBM Form ITS 1000 A 310. These forms may be procured from the International 
Business Machines Corporation. Cost, 2.3¢ each. 

* The back rather than the front of the cardboard form was used in order to reduce 
the amount of eye strain in Scoring. The circles on the front of the cardboard form 
tend to produce mental confusion and fatigue of the eye muscles. 

© These circles were used as guides in Securing accuracy in punching. 


j 
| 
y 


The Strong Vocational Interest Blank (Men) 145 


A. Positive weights, page 1 of Answer Sheet (items 1-200); 
B. Positive weights, page 2 of Answer Sheet (items 201-400); 
C. Negative weights, page 1 of Answer Sheet (items 1-200); 
Negative weights, page 2 of Answer Sheet (items 201-400). ' 


Procedures for Window-Stencil Scoring 


len) becomes greatly simplified. Obviously, to evaluate the 

est in a given Strong occupational category, it is necessary 
the algebraic sum of the positive and negative weights which 
n that scale. For any given scale, the sum of his positive 
y be quickly determined by applying window stencil A to 
the answer sheet and window stencil B to page 2 of the answer 
milarly, the sum of his negative weights may be obtained by 
te application of window stencils C and D. The algebraic 
these two sums constitutes his total raw score on the given 
nal category. ‘The raw score thus obtained corresponds exactly 
w score obtainable by ladder stencil or machine scoring pro- 

nd may be interpreted accordingly. 


Evaluation 


gh a considerable amount of exacting work was involved in our 
ion of the window stencils herein described, it has been our ex- 
that this labor expenditure was of minor significance in com- 
to the vast and varied benefits which we have derived from their 
udgetary terms, two types of savings have been notable: (1) 
decrease in clerical time involved in window stencil scoring as 

with ladder stencil scoring; (2) use of IBM Answer Sheets 
of expendable Strong booklets has markedly reduced the per 
of testing materials and has thus permitted much more ex- 
of the Strong test than otherwise would have been feasible. 
y, our experience indicates that the margin of error in scoring the 
our window stencil procedure is markedly less than that ob- 
hen the Strong ladder stencil system is used. 


A Short Test of Mental Ability 


Jay L. Otis and David J. Chesler 
Personnel Research Institute, Western Reserve University 


A survey of 26 paper-and-pencil tests of mental ability suitable for 
use at the adult level, practically all of which are listed in the Nineteen 
Forty Mental Measurements Yearbook (1), showed that the range of 
“examination time” varied from 12 to 153 minutes. Five of these tests 
required 16 minutes or less. The median examination time was 32 
minutes. It would seem that there are few short tests of mental ability 
suitable for adults—‘“short” in this connection being defined as approxi- 
mately 15 minutes or less. 

While, in general, no claims of superiority with respect to reliability 
or validity can be made for the short test as compared with a longer 
test, nevertheless the short test has demonstrated its usefulness and 
practicability, and, in many instances, certain advantages over the 
longer test. In the industrial employment office, where time is often at 
a premium, the short test of mental ability can yield results of more 
than acceptable validity with respect to the types of jobs and individuals 
involved. In those situations where the standards are more precise, 
the short test may be used to sereen out those individuals who are obvi- 
ously below or above the desired mental standards, so that a longer test 
of mental ability and tests for other functions will be reserved for those 
who fall within the accepted range. This is an economical procedure, 
both to the applicant and to the organization. The applicant who 
cannot possibly qualify is prevented from embarking on a lengthy 
testing Program, and the organization is saved the time and expense in- 
volved in administering and scoring a complete test battery. 

In the vocational guidance situation, the short test of mental ability 
has very useful application, also, in that it may be an excellent indicator 
of the type (e.g., “elementary,” “intermediate,” or “advanced”) of 
longer test that should be administered. It is not an uncommon ex- 
perience with psychometrists and vocational counselors to realize that 
the counselee has taken, or is in the process of taking, a test of mental 
ability which is inappropriate to his level. A short test of mental ability 
used as a “pre-test” will prevent this from happening. A knowledge of the 
testee’s intelligence, obtained before the test battery is decided upon, i8 
also extremely helpful in determining what special tests of aptitude and 

146 


NE ee  MM 


A Short Test of Mental Ability 147 


should be administered. For example, in the case of a 
who wants to go to college, but whose pre-test shows him to be 
ly below average in intelligence, tests of aptitude should be 
d which are more applicable to a lower level of employment 
ing, and tests applicable at the college level should be omitted. 
pre-test serves other purposes in the counseling situation. 
ered before the initial interview, it provides clues as to the 
to which the counselee will understand the verbal give-and-take 
tial interview. It can also be a fast and reliable determiner of 
ity for an individual rather than a group test of mental ability. 
these reasons the Personnel Research Institute of Western Re- 
niversity initiated in 1942 a research project with the purpose of 
ing a short test of mental ability. The Personnel Research 
te was in an excellent position to undertake this project since it 
up to carry on personnel research in such areas as the development 

lures for employment and training of workers, as well as the 
pment of techniques in the field of vocational guidance (2). The 
es of the Personnel Research Institute solved in large part the 
m of obtaining suitable populations for the standardization and 
of a new test. The result of this research is the Classification 


Industrial and Office Personnel (3). 


Description of the Test 


Classification Test for Industrial and Office Personnel is primarily 
e of mental ability at the adult level, although evidence has 
ted that it is also satisfactory for use at the high school level. 
self-administering group test and intended for individuals who 
how to read. An attempt was made to include items of approxi- 
7 uniform difficulty throughout the test and to keep the difficulty 
relatively low. Most group tests of intelligence present items in 
er of increasing difficulty. This is often discouraging to the ordinary 

‘office worker. In addition, an increasing order of difficulty tends 
luce the total number of items required. In a short test of mental 
constructed on this basis many subjects reach their difficulty 


a very short time (perhaps 7 or 8 minutes) so that the effective 


of items is reduced still further. In the standardization of such 
pletion of even two or 


in the standard score or 
In other words, the individual who “gets stuck” on 


148 Jay L. Otis and David J. Chesler 


contains 100 items, which are as many as appear in longer tests. Indi- 
viduals at the college level will often answer correctly as many as 90 items 
and a small number of individuals (about 4 per cent) at this level will just 
about succeed in attempting every item in the maximum time allowed. 

Type and Arrangement of Items. The 100 items are spiralled in series 
of five as follows: vocabulary, general information, arithmetic, general 
information, and analogies. There is thus a total of 40 general informa- 
tion items and 20 each of vocabulary, arithmetic, and analogies. The 
entire test is contained in a four-page booklet with the directions and 
practice problems on the first page and the test items on pages 2, 3, and 4. 
All of the items are of the multiple-choice type, with four alternates. 

Time Limits. The time limit of the Classification Test for Industrial 
and Office Personnel has been kept to a minimum to make it practical 
to use in the employment situation. It is possible to use a time limit 
of either 10 or 15 minutes. The 15-minute time limit is recommended 
since the norms for this time limit are based on an appreciably larger 
number of cases than for the 10-minute period. 

Standardization. Originally the test was administered in tentative 
form to over 3000 subjects. These included general college students, 
engineering college students, evening college students, high school stu- 
dents, nursing school applicants, clerical workers from typical manu- 
facturing establishments, salesmen, and factory workers. The test went 
through two mimeographed versions and one printed version on an experi- 
mental basis before it was published in its final form. 

Reliability and Equivalence of Forms. The odd-even reliability of the 
test, as corrected by the Spearman-Brown formula, is .94. Two forms 
of the test, A and B, are available. A correlation of .86 between the two 
forms was obtained when they. were administered in A-B order to a 
group consisting of 90 academic high school students and 159 college 
students. A correlation of .85 was obtained for a group of 72 commercial 
high school students. Correlations of .80 and 82 were obtained for 
Similar groups who took the tests in B-A order. 

The differences in difficulty between the two forms are practically 
negligible and appear to approach the minimum that can be expected. 
For a group of 389 academic high school and college students, the dif- 


job or school performance. These validity coefficients are presented in 
test is short and does not cover the entire range of 


A Short Test of Mental Ability ` 149 


ity, correlations between it and longer tests of intelligence are 
as are usually obtained between longer tests of intelligence. 
seen from Table 1, the test has demonstrated low but positive 
the industrial situation and somewhat better validity in the 
school situation. It would appear, however, that because 
time limit, the test is appropriate as part of a battery designed 
al or school use. It is of interest to note that a critical norm 
established in two validity studies. In the first of these the 
; used as part of a battery to select salesmen. It was found that 
ring below 40 were difficult to train and inferior in sales perfor- 
In the second study it was found that men scoring below 40 
risks for the job of bus or street car operator. 
The following norms are available: Adult (N = 1662); 
‘college (N = 946); engineering college (N = 113); evening college 
; high school (N = 383); nursing school applicants (N = 254); 
workers (N = 137); sales personnel (N = 225); factory workers 
); general population (N = 6007). 


Table 1 
Coefficients for the Classification Test for Industrial and Office Personnel 


N r Criterion 
Other Tests 
191 69 A.C.E., 1942 Edition 
83.75 Otis S-A, Higher Forms B and D 
100 80 Otis S-A, Higher Forms A, B, and D 
149.69 Otis S-A, Higher Form D 
105 76 A.C.E., 1941 Edition 
254 62 California Mental Maturity, Form A 
44 83 Otis S-A, Higher Form D 
School Course Grades 
123 46 Business Information and Mathematics 
126 27 Typing 
53 37 Bookkeeping 
53 38 Stenography 
46 47 Office Production 
46 56 Filing 
39 46 Machine Calculation 
Job Performance ; 
45 31 Ratings of sales performance 
45 .21 Total sales for two years _ 
79 44 Sales ability (biserial r) 


44 A9 Job rating 
Progress rating 


150 Jay L. Otis and David J. Chesler 


Summary 

A short test of mental ability has been described which, it is felt, is 
very appropriate for use in the industrial and vocational guidance situa- 
tions. This test is the Classification Test for Industrial and Office Per- 
sonnel, Forms A and B. 

The distinguishing characteristics of this test are: (1) a short time 
limit; (2) a large number of items of approximately uniform difficulty, 
rather than a small number of items presented in order of increasing 
difficulty. It is believed that this sort of mental ability test is more suit- 
able to the typical office or factory employment situation than the usual 
type of intelligence test. 

At the present writing the test has been standardized on over 6000 
subjects. The odd-even reliability is .94, and the correlation between 
alternate forms varies from .80 to .86. Differences in difficulty between 
the two forms are practically negligible. Norms are available for nine 
different industrial and school populations. Validities with other, longer, 
tests of mental ability range from .62 to .83. Validities with grades in 
commercial high school courses range from .27 to .56. Validities with 
various criteria of job performance range from .21 to .49. 

Received October 1, 1948. 
References 
1, Buros, O. K., Ed. The nineteen forty mental measurements yearbook. Arlington, 
Va.: Gryphon Press, 1945. 
2. Otis, J. L. The Personnel Research Institute of Western Reserve University. J. 
consult. Psychol., 1946, 10, 131-135. 


3. Otis, J. L., et al. Classification test for industrial and office personnel (Forms A 
and B), Cleveland, Ohio: Western Reserve University Press, 1947. 


I S a 


iated Job Evaluation Scales Developed on the Basis 
of “Internal” and “External” Criteria 


David J. Chesler 
Personnel Research Institute, Western Reserve University 


recent years much of the published material in the field of job 
tion which might properly be designated as “research” has been 
ned with abbreviated job evaluation scales. Most of this work has 
performed by Lawshe and various associates (4, 5, 6, 7). The 
3) has also presented some findings on this topic. All of these 
tilized the Wherry-Doolittle selection method (8) to derive the 
reviated scales. The procedure has been to apply the Wherry- 
little process to the factors or “rating scale items” which comprise a 
uation scale, and to identify the first three or four factors in the 
hich contribute most to the ratings which jobs receive on the scale. 
tings predicted from these three or four factors are then compared 
the ratings received on all of the original factors. The criterion 
original job evaluation scale from which the abbreviated scale was 
ved. 
The present study has attempted to answer the question as to which 
e or four factors in a job evaluation scale would be identified if 
er job evaluation scale were used as the criterion. Such a criterion 
been designated throughout this report as an “external ’’criterion, 
ontrast to the rating on the original manual, which may be designated 
“internal” criterion. Will similar abbreviated scales emerge when 
us job evaluation manuals constitute the external criteria? It is 
ed that a study of this sort offers a method of analyzing the differ- 
es between two job evaluation manuals. Specifically, it answers the 
ion of what factors in one job evaluation system constitute the best 
of another system. i 
Method 
b raters in three industrial organizations rated independently de- 
tions and specifications for 35 “standard” salaried jobs on a “‘stand- 
job evaluation manual and on their own respective company 
nuals. The jobs, the standard manual, the company manuals, and the 
analysts involved are the same as those reported in previous studies 


151 


152 David J. Chesler 


Results and interpretation 


Standard Manual Factors Identified with Internal Criterion. As re- 
ported previously (3), the Wherry-Doolittle selection method was applied 
to the standard manual factor ratings submitted by the raters in the three 
companies, with total rating on the standard manual as the (internal) 
criterion. With the internal criterion the first four factors identified 
with each of the three groups of raters were the same, although the order 
of identification was not the same! These four factors were: “Work 
experience”; “character of supervision received”; “character of super- 
vision given”; and “responsibility for confidential matters.” 


Table 1 
Abbreviated Scales Derived from Standard Manual with External Criteria in Three 
Companies by Raters Who Rated the Standard Jobs on the Standard 
Manual and on Their Respective Company Manuals 


Co. A Co. B Co. C 

Factor No. R Factor No. R Factor No. R 
5 839 6 854 5 905 
2 921 9 926 4 -959 
6 941 10 945 2 969 
10 956 8 956 ll 974 
11 .958 8 977 
7 978 
10 979 
12 .979 
6 .978 


Key to Factor Numbers: 2, Essential knowledge and training; 4. Character of 
Supervision received; 5. Character of supervision given; 6. Number supervised; 7. Re- 
sponsibility for funds, securities, and other valuables; 8. Responsibility for confidential 
matters; 9. Responsibility for getting along with others; 10. Responsibility for accu- 
Tacy—effect of errors; 11. Pressure of work; and 12. Unusual working conditions. 


Standard Manual Factors Identified with External Criterion. The 
Procedure followed in the present study was to apply the Wherry- 
Doolittle selection process to the factors of the standard manual, with 
total ratings on a company manual as the (external) criterion. The 
any i are summarized in Table 1. 

ince comparisons of abbreviated scales have reviously (3) been 
made on the basis of the first four factors identified, we BB eicera 
outselves in Table 1 only with the first four factors identified in each 


1 For a more complete discussion of these findings, see a previous study (3). 


Abbreviated Job Evaluation Scales 153 


_ The striking feature of the abbreviated scales that emerge 
t external criteria is their dissimilarity—as contrasted with 
y similarity of the abbreviated scales that emerged with the 
al criterion (3). The number of times certain factors were 
among the first four factors for the three groups of raters may 
‘ized as follows: 


Factor No. Times 

Number Factor Identified 
2, Essential knowledge and training 2 
4. Character of supervision received 1 
5. Character of supervision given 2 
6. Number supervised 2 
8. Responsibility for confidential matters 1 
9. Responsibility for getting along with others 1 
10. Responsibility for accuracy-effect of errors 2 
11. Pressure of work 1 
12 


Total 


of a possible total of twelve factors, eight appeared either once 
It is interesting that no single factor emerged three times, that 
each of the abbreviated scales derived with an external criterion. 
t factors, three (‘character of supervision received,” “character 
ervision given,” and “responsibility for confidential matters”) 
o identified in abbreviated scales derived with the internal cri- 
Tt would appear that these three factors are important, not 
the standard manual, but also in some form or other in the 
used in the three companies. The fact that two “supervisory” 
iS were identified, not only with the internal criterion, but also with 
ent external criteria would indicate that in the standard and 
mpany manuals factors concerned with supervision are very im- 
‘would seem that the results obtained with external criteria indicate 
y essential differences among the company manuals, as analyz 
of the standard manual factors. This may be contrasted with 
sults obtained with the same internal criterion—which indicate 
ily differences among the raters (3). 
y of Abbreviated Scales Derived from Standard Manual with 
Criterion in Predicting External Criterion. As in the case of 
ted scales derived with internal criteria (3), an analysis was made 
accuracy with which the abbreviated scales, derived from the 
manual with external criteria, predict the external criteria, 


ratings on the company manuals. 


154 David J. Chesler 


The multiple regression equations for predicting total points on the 
company manuals from point ratings on the selected standard manual 
factors were computed and applied to the ratings given on the selected 
standard manual factors by the raters in the three companies. Three 
sets of predicted company manual ratings were thus obtained. 

The actual classification plans of the three companies (see Table 2)? 
were used to study the comparative adequacies of the three abbreviated 
scales. The labor grades within each of the company plans are unequal 
and follow roughly a geometric rather than an arithmetic progression. 

Table 3 shows the per cent of jobs in each instance which remained 
in the same labor grade, or which were displaced into another labor grade. 
In companies A, B, and C, respectively 88.5 per cent, 91.4 per cent, and 
97.1 per cent of the jobs remained in the same labor grade or were dis- 
placed into a labor grade adjacent to that of the original classification. 
In all three companies some jobs were displaced by two or three labor 
grades. 

Table 4 shows how ratings on the abbreviated scales derived with 
external criteria deviated by the point value of 0.5 labor grade, 1.0 labor 
grade, or more than 1.0 labor grade from total ratings on the original 
(company) manuals. In the three companies respectively 31.4 per cent, 
54.3 per cent, and 68.6 per cent of the predicted ratings deviated from the 
original ratings by the point value of 0.5 labor grade or less. Similarly 
65.6 per cent, 85.7 per cent, and 88.6 per cent of the predicted ratings 
deviated from the original tatings by the point value of 1.0 labor grade 
or less. In other words, for three companies respectively 34.4 per cent, 
14.3 per cent, and 11.4 per cent of the predicted ratings deviated from 
the original ratings by a point value greater than one labor grade. 

Adequacy of Abbreviated Scales Derived from Standard Manual with 
Internal Criterion in Predicting External Criterion. The first four factors 
of the standard manual consistently identified by the Wherry-Doolittle 
selection process with total rating on the standard manual as the (internal) 
criterion were factors 1, 4, 5, and 8, that is “work experience,” “character 
of supervision received,” “character of supervision given,” and “re- 
sponsibility for confidential matters” (3). These factors might be de- 
scribed as the primary factors of the standard manual because, when 
weighted properly, they are the “best measure” of total ratings on the 
standard manual. It is of interest to know how well this best measure 
of a manual measures total ratings on other manuals. 

-Tables 2 to 6 inclusive have been deposited with the American Documentation 
{nstitute. | Order Document 2558 from American Documentation Institute, 1719 N St- 

-W., Washington 6, D. C., remitting $0.50 for microfilm (images 1 inch high on stand- 


ard 35 mm. motion picture film) or $0.70 f i i ble 
without optical aid. a 


Abbreviated Job Evaluation Scales } 155 


e specific question to be answered here is how well do the abbrevi- 
Jes derived from the standard manual with total ratings on the 
wd manual as the (internal) criterion predict the external criterion, 
s, company manual ratings. 
multiple regression equations for predicting company manual 
rom factors 1, 4, 5, and 8 of the standard manual were computed 
plied to the ratings given on these factors by the raters in the 
companies. ‘Three sets of predicted company manual ratings were 
s obtained. 
, the actual classification plans of the three companies (see 
2) were used to study the comparative adequacies of the abbrevi- 
scales. Table 5 shows the per cent of jobs in each instance which 
d in the same labor grade or which were displaced into labor 
s one, two, or more grades removed from that of the original classifi- 
In companies A, B, and C, respectively 68.5 per cent, 94.2 
, and 91.4 per cent of the jobs remained in the same labor grade 
displaced into a labor grade adjacent to that of the original 
fication. 
_ Table 6 shows how predicted company manual ratings based on the 
ated scales derived with internal criteria deviated by the point 
of 0.5, 1.0, or more than 1.0 labor grade from total ratings on the 
(company manual) scales. In the three instances 11.5 per cent, 
per cent, and 60.0 per cent of the predicted ratings deviated from the 
al ratings by the point value of 0.5 labor grade or less; and 42.8 
cent, 71.4 per cent, and 91.4 per cent of the predicted ratings deviated 
the original ratings by the point value of 1.0 labor grade or less. 
lomparison of All Abbreviated Scales Derived from Standard Manual. 
th the present and in a previous study (3) abbreviated scales have 
derived from a standard manual, with an internal criterion (standard 
ual total rating) and with an external criterion (company manual 
ting). Table 7 summarizes the data required to form an opinion 
to the relative adequacies of these abbreviated scales in predicting 
the internal or external criterion. 
n terms of the multiple coefficient of correlation and the index of fore- 
efficiency the adequacy of prediction is clearly in the hierarchy: 


. Abbreviated scales derived with internal criterion and used to 


et the internal criterion. 
‘Abbreviated scales derived with external criterion and used to 


ict the external criterion. 
. Abbreviated scales derived with internal criterion and used to 


ict the external criterion. 


156 David J. Chesler 


Table 7* 


Adequacy of Abbreviated Scales Derived from Standard Manual with Internal 
Criterion (Standard Manual Ratings) and External Criterion 


(Company Manual Ratings) 
Derived with Derived with Derived with 
internal external internal 
criterion; criterion; criterion; 
used to used to used to 
predict predict predict, 
internal external external 
Co. criterion criterion criterion 
Multiple coefficient of corre- A 98 96 91 
lation (R) B 98 96 88 
Cc 99 98 97 
Index of forecasting efficiency A 79 72 .59 
Œ) B 81 72 52 
i $ Cc 85 78 75 
% jobs remaining in same, or A 100. 88.5 68.5 
displaced into adjacent, labor B 100. 91.4 94.2 
grade c 100. 97.1 91.4 
% predicted ratings deviating A 94.2 65.6 42.8 
value of 1.0 labor grade or less B 94.2 85.7 71.4 
from original ratings (e) 97.1 88.6 91.4 
* See footnote 2. 


This hierarchy is apparent from the fact that all R’s and E’s decrease 
as one reads across each row. 

In terms of the percentage of jobs remaining in the same or in being 
displaced into an adjacent labor grade, this hierarchy holds for companies 
A and C, but not for Co. B. However, in the case of Co. B the discrep- 
ancy is due to a difference of only one Job. 

In terms of the Percentage of predicted ratings deviating by the value 
of one labor grade or less from the original ratings, the hierarchy holds for 
companies A and B, but not for Co. C. However, here again the dis- 
crepancy is due to a difference of only one job. 


Summary 


ibe The basic methodological feature of the present study was to have 
raters in three companies evaluate a standard set of descriptions and 


2. The Wherry-Doolittle selecti 7 dard 
Manual faptor Fatih on method was applied to the standar 


Abbreviated Job Evaluation Scales 157 


g on the standard manual as the (internal) criterion. The 
ir factors identified were the same for each group of raters, al- 
e order of identification was not the same (3). These results 
primarily differences among raters. 

Wherry-Doolittle selection method was again applied to the 
manual factor ratings submitted by the raters in each company, 
total ratings on the respective company manuals as the (ex- 

iterion. Out of a possible total of twelve factors, eight were 
d among the first four for all three groups of raters. The striking 
the abbreviated scales derived with external criteria is their 
ity—as contrasted with the striking similarity of the abbrevi- 
s that emerged with the same internal criterion. These results 
primarily differences among the company manuals, as analyzed 
s of the standard manual factors. 
analysis of the adequacy of the abbreviated scales derived from 
ard manual with internal and external criteria in predicting the 
al and external criteria indicates in general the following hierarchy 

cy of prediction with abbreviated scales: 
Derived with internal criterion and used to predict the internal 


D 


Derived with external criterion and used to predict the external 


y References 

E. J., Burk, S. L. H., and Hay, E. N. Manual of job evaluation. New York: 
Harper & Brothers, 1941. j ! 
D. J. Reliability and comparability of different job evaluation systems, 


J. appl. Psychol., 1948, 32, 465-475. 
_ Reliability of abbreviated job evaluation scales. J. appl. Psychol., 1948, 32. 


C. H., Jr. Studies in job evaluation: II. The adequacy of abbreviated 
"point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 
1945, 29, 177-184. 
and Alessi, S. L. Studies in job evaluation: 
“rating scale for hourly-paid jobs and the adequacy 
appl. Psychol., 1946, 30, 310-319. i A j 
and Maleski, A. A. Studies in job evaluation. 3. An analysis of point ratings 
salary paid jobs in an industrial plant. J. appl. Psychol., 1946, 30, 117-128. 
alysis of the factor 


, and Wilson, R, F. Studies in job evaluation. 5. An an: 
arison system as it functions in a paper mill. J. appl. Psychol., 1946, 30, 


IV. Analysis of another point 
of an abbreviated scale 


ead, W. H., Shartle, C. L, and Associates. Occupational counseling techniques. 
York: American Book Co., 1940. 


Studies in Job Evaluation: 8. The Reliability of an 
Abbreviated Job Evaluation System 


C. H. Lawshe and Patrick C. Farbro 
Occupational Research Center, Purdue University 


Several systems for evaluating jobs have been developed. Of these 
much has been written and considerable experimentation has been carried 
on because the setting of wage rates is one of the most important mana- 
gerial functions. The great majority of these systems arrive at their 
goal—the systematic pricing of jobs—by breaking the jobs into their 
various elements or components. The number of elements on which 
jobs have been rated varies from system to system. Using a scaling 
method of some sort, the rater assigns degrees of each component to each 
job, the various degrees are weighted and total point values are con- 
verted into wage rates. 

Previous Studies. As a result of a series of studies by the senior 
author and others (1, 2, 3, 4, 5, 6), an abbreviated system of job evaluation 
has been developed and reported. When forty job descriptions were 
submitted to two groups of independent raters, one of which applied the 
NEMA system and the other used this system, a correlation of .90 
between the two was obtained (7). Lawshe and Wilson (6) have shown 
in a previous study that the abbreviated system of four items is more 
reliable (.98 for five raters) than the NEMA system (.94 for five raters). 
However, since their data were gathered by sending job description by 
mail to the cooperating analysts, the question of functional reliability 
in the practical situation remains unanswered. 

Purpose of this Study. The primary purpose of this study was to 
determine the reliability or consistency with which raters, all from the 
same plant, evaluate jobs in that plant by means of this simplified system. 

_More specifically, the purposes of this study are: (1) to compare 
reliability coefficients obtained in Lawshe and Wilson’s study of hypo- 
thetical jobs with reliability coefficients in an operating plant; (2) to 
compare independent ratings made by the evaluation committee with 
Tatings adjusted through conference discussion; and (3) to examine 
rating differences between labor committee members and management 
committee members. 


158 


Studies in Job Evaluation: 8 159 


Procedure 


‘Abbreviated Job Evaluation System. The system of job evaluation 
this study is that developed by Lawshe. The system provides 
rating of jobs on four scales: “General Schooling,” “Learning 
“Working Conditions,” and “Job Hazards.” 

e Job Evaluation Committee. The committee used in evaluating 
-three jobs in this study consisted of five members. Two of 
bers were employees belonging to the union. Management 
tatives on the job evaluation committee included the production 
and the secretary of the company. The fifth member of the 
ee was the production superintendent during the time production 
being evaluated and the maintenance superintendent while 
ce jobs were being evaluated. 

g the Jobs. The actual procedure of rating the jobs consisted of 
phases which were preceded by standard job description prepara- 
1. After a general orientation, each committee member was furnsihed 
t of forty-three 3” by 5” white cards on which had been typewritten 
the forty-three job titles. The committee was then instructed to 
only the “General Schooling” required for performing each job 
that basis to place the cards in rank order from the job requiring 
test amount of schooling to the job requiring the least amount 
ooling for successful performance on the job. On completion of 
sk, a set of six colored cards representing each of the six degrees of 
General Schooling” scale was given each committee member. Mem- 
‘were then instructed to insert the colored cards in their stack of white 
5 at the places most logical for the breaks. Thus the degrees of the 
ral Schooling” scale were assigned. In similar manner, “Learning 
,” “Working Conditions,” and “Job Hazards” scales were em- 
in rating each job. 
rom these cards a summary page showing the degrees assigned each 
y each committee member was prepared and anchor jobs, those on 
-at least four of the members initially agreed, were identified. The 
ittee was then again assembled and by using the anchor jobs 
ference points, members discussed and adjusted the ratings for those 
Lon which there was disagreement. It is important, however, that 
initial ratings were made without committee discussion. 


Results n 
Obtained “one against one” Reliability Coefficients. Shown in 
1 in the second column are the obtained reliability coefficients for 
of the items of the abbreviated system and for total points as jobs 


160 C. H. Lawshe and Patrick C. Farbro 


were evaluated in this study. The figures shown in this column are the 
averages of the coefficients obtained by correlating initial ratings of each 
rater with initial ratings of every other rater.1 The figures shown in 
column two are the most likely correlations between the ratings of one 
rater and the ratings of one other rater. For convenience and for com- 
parison with the previous study by Lawshe and Wilson (6), these have 
been called the “one against one” reliabilities. 


Table 1 


Reliability Coefficients for Total Point Ratings and for the Component Scale 
Ratings in the Lawshe-Wilson Study and in This Study 


“One against one” “Five against five” 
Reliability Reliability 

Lawshe- This Lawshe- This 

Item Wilson Study Wilson Study 
Total Points 89 91 98 -98 
Learning Period 86 84 .97 -96 
General Schooling .79 84 95 96 
Working Conditions 61 73 89 -93 
Job Hazards 51 54 .84 86 


The “five against five” Reliability Coefficients. Even though the re- 
liability coefficients shown in column two are those actually obtained, 
they are inadequate estimates of the true reliability of pooled ratings of 
members of the committee. As was mentioned before, the “one against 
one” reliabilities are the best estimate of reliability of the ratings of one 
rater as compared with those of one other rater. Since five raters were 
involved in the rating of each job, the reliabilities in the second column 
were “stepped up” by use of the Spearman-Brown formula to estimate 
gpa of pa poled ratings of all five of the job evaluation com- 

members. These “stepped up” rati in Table 1 in 
ths Ton pped up” ratings are shown in 

It is not advocated that these coefficients of reliability be accepted as 
absolute, but merely that they are estimates of the ture reliabilities of 
the abbreviated job evaluation system. The results presented should 
be qualified in view of one’s own evaluation of the assumptions involved 
in such a procedure. 

j It is evident from column four that reliability coefficients of the mag- 
nitude found are definitely high enough for purposes for which the system 


was deere As WD noted, “five against five” reliability coefficients 

‘Correlations were found between the followin, i : A-B, 

g patterns of pairs of raters: , 

pea Rae A-E, A-F, B-C, B-D, B-E, B-F, C-E, C-F, D-E, D-F, E-F. Obtained 
S aa T ta averaged by transformation to Fisher Z-values (9). 


Studies in Job Evaluation: 8 161 


scales range from .86 (Job Hazards) to .96 (Learning Period), 
‘put one scale, “Job Hazards,” above .90. Agreement among 
evidenced by a reliability coefficient of .98 for total points 
indicates high enough reliability for most practical purposes. 
rison of Lawshe-Wilson Study and This Study. The first item 
in comparing the data from the two studies in Table 1 is the 
eement of the reliabilities found for total point ratings (.89 and 
fone against one” reliabilities and .98 and .98 for “five against 
abilities). 

gle items found most reliable in this study are those of the 
ands” factor (Learning Period and General Schooling) and 
same as those found most reliable in the previous study. 

ext most reliable items in this study (Working Conditions) has 
rank position in the Lawshe-Wilson study. The least reliable 
f the abbreviated system (Job Hazards) was found in the same 
tion in both studies. 


nation of the reliabilities in the Lawshe-Wilson study were a con- 
ive estimate of the reliability of the abbreviated system when 
in an actual industrial situation. This is easily understood 
the Lawshe-Wilson study the several raters were from different 
seattered geographical locations and used only job titles and 
Ptions in evaluating the jobs, while in this study five raters were 
g definite jobs in a plant with which each was familiar. 

parison of Management and Labor Ratings. In comparing the 
ity of labor and management committee members, the first item of 
st is the consistency of the findings as shown in Table 2. The 
tion coefficients representing reliability or agreement between the 
committee members are consistently lower than those representing 
nent between two management members. The “one against one” 
bility coefficients for labor union committee members range from. 37 
g Conditions) to .83 (Total Points), while for management 
s they range from .80 (Job Hazards) to .94 (Total Points). 
idering agreement between two labor union committee members 
WO management representatives? “one against one” reliability 
ients range from .66 (Job Hazards) to .86 (Total Points). These 
ability coefficients, it will be noted from Table 2, fall between those of 
bor members which are lowest and those of management repre- 


R 


i ge which are highest. 
f obtained correlations between each management member and each labor 
derived by transformation to Fisher Z-values. 


162 C. H. Lawshe and Patrick C. Farbro 


Table 2 


Coefficients of Reliability for Two Labor and Two Management Job 
Evaluation Committee Members 


Labor- Mgmt- Labor- 

Item Labor Mgmt Mgmt 
Total Points 83 94 86 
Learning Period . 73 86 .80 
General Schooling 71 92 17 
Working Conditions 37 90 71 
Job Hazards 68 80 -66 


Comparison of Initial Ratings with Adjusted Ratings. As was pre- 
viously mentioned two sets of ratings were available for each job title— 
initial ratings, independently assigned by the raters for each of the 
various scales for each job title, and adjusted ratings, those resulting from 
conference discussion. The mode of the adjusted ratings was the point 
value actually used as the basis for the wage structure in the plant. The 
mean or average points assigned by each were used in the Lawshe-Wilson 
study since it was impossible to assemble the various raters in a conference 
for the purpose of adjusting ratings. For this reason it was considered 
advisable to obtain a measure of relationship between the mean initial 
ratings and the mode of adjusted ratings. In Table 3 the correlation is 
shown to be .97 for Total Points, and to range from .83 (Job Hazards) 
to .94 (General Schooling) for the component scales of the abbreviated 
system. These values are probably large enough to support the hy- 
pothesis that conclusions based upon mean independent ratings are valid 
for plant situations in which majority decisions are reached. 

: Similarly, it seemed desirable to investigate separately the relation- 
ship between initial ratings as made by management, labor, and main- 
tenance and production superintendents and the mode of adjusted 


Table 3 
Coefficients of Correlation Between Mean of Initial Ratings and 
Mode of Adjusted Ratings 
Ttem T 
Total Points ‘7 
Learning Period 92 
General Schooling 94 
Working Conditions 86 
Job Hazards 83 
Ta a Mt ee eee ee 


wy 


og Studies in Job Evaluation: 8 163 


Table 4 shows these relationships. Coefficients of correlation 

mean of management representatives’ initial ratings and the 

of adjusted ratings were found to be consistently larger (ranging 

on “Job Hazards” scale to .97 for Total Points) than those of 

ion members (.65 to .93 on the same scales). The rank order of 

f the coefficients is the same for both management and labor 
ittee members. 


3 Table 4 


| Coefficients of Correlation Between Initial Ratings for Management, Labor, 
and Superintendents, and Mode of Adjusted Ratings 


Superintendents 

Mgmt Maint. Prod. 
97 97 7 
95 96 96 
93 93 83 
86 85 .90 
73 77 89 


„shown in Table 4 are the correlations between the maintenance 
duction superintendents’ initial ratings and the mode of adjusted 

The magnitude of these coefficients (ranging from .77 on “Job 
s” scale to .97 for Total Points) is higher than those of the labor 
committee members. They are also larger than those of the 
committee members on all but the “General Schooling” 


a considering the change from initial ratings to adjusted ratings, it 
o deemed advisable to examine actual point value changes. This 
mplished by tabulating each rater’s actual point change from 
ratings to his adjusted ratings. The point value for each item 
points assigned by each rater for each job was considered in 
a For example, on job number 1, Rater A initially rated 
as being worth 130 points on the “Learning Period” scale. During 
ference discussions his rating was changed to 150 points; thus a 

as tabulated. This procedure was followed throughout. 
own in Table 5 is the mean gross change per job by raters from 
al point ratings to adjusted point ratings. These point changes were 
d as above by adding point value changes disregarding algebraic 
From Table 5 a general trend may be seen. Raters “E” and “F”, 
or members, have the greatest average gross change from initial 
sted ratings while Rater “O”, the maintenance superintendent, 


164 C. H. Lawshe and Patrick C. Farbro 


Table 5 
Mean Gross Change per Job of Raters from Initial Point Ratings 
to Adjusted Point Ratings 
Total Learning General’ Working Job 
Rater N Points Period Schooling Conditions Hazards 

A (Mgmt) 43 10.4 6.9 3.1 1.3 7 
B (Mgmt) 43 10.1 44 5.0 A!) 9 
C (Sup-Maint) 16 1.8 1.2 0.0 4 2 
D (Sup-Prod) 27 10.2 3.8 5.2 2 1.2 
E (Labor) 43 18.6 8.7 12.4 1.1 9 
F (Labor) 43 18.3 12.0 5.7 1.9 8 


changed his initial ratings the least. Raters “A” and “B”, the two 
management committee members, changed less (10.4 and 10.1 average 
points per job, respectively) from initial ratings to adjusted ratings than 
did the labor committee members (18.6 and 18.3 average points per 
job for Raters “E” and “F”, respectively). 

In Table 6 the average net change per job by raters from initial point 
Tatings to adjusted point ratings is shown. These values were obtained 
in the same manner as described above except that algebraic signs were 
considered. In general the trend shown in Table 6 is that Raters “A” 
and “B”, the two management raters, and Raters “C” and “D”, the 
production and maintenance superintendents, initially tended to under- 
rate the jobs except in relation to the “Working Conditions” scale; 
therefore, they had to increase point ratings in the conferences, while 
Raters “E” and “F”, both labor union members, over-rated jobs on the 

‘General Schooling” and “Working Conditions” scales but under-rated 
on the “Learning Period” and “Job Hazards” scales. 

Tt is interesting to note that the mean of the initial points as conceived 


Table 6 
Average Net Change per Job of Raters from Initial Point Ratings 
to Adjusted Point Ratings 
o UM 


Total  Learnin; i Job 
Rater N Points Period Pei Contes Hazards 

A (Mgmt) 43 «46.9 58 Y 6 
B (Mgmt) + 43 +2.6 “4 1 Hae Baas E 8 
C (Sup-Maint) 16 +13 +12 0 +4 - 3 
D ure Brod) PTS A800. | +12 
Babee) 480 20 +56 —6.9 DA +2 
peta?) ae +83 +9.3 =g -13 + 6 


eee 


Studies in Job Evaluation: 8 165 


valuation committee is 250.30 while after conference discussion 

of adjusted ratings is 254.63. In analyzing this difference of 
nts, a critical ration of 2.47 was found, indicating the difference 
mificant at the 2 per cent level of confidence. 


Summary and Conclusions 


yb evaluation data for forty-three jobs from a manufacturing plant 
m abbreviated evaluation system were analyzed. The job evalua- 
mmittee, including two management members, two employees- 
with the labor union active in the plant, and the maintenance 
tendent or production superintendent when maintenance or pro- 
jobs were being considered, evaluated each job on the four items 
abbreviated system. 
ility coefficients for the total point ratings and for the individual 
re obtained by correlating ratings given each job title on the 
f each of the four factors of the evaluation system. Correlations 
‘ound between the ratings of each rater as paired with every other 
id these obtained coefficients were averaged after transformation 
her Z-values. These average intercorrelations were then stepped- 
sing the Spearman-Brown formula to obtain the estimated reliability 
tings of the five-member committee. 
parison was made with a previously published study by Lawshe 
son which employed the abbreviated evaluation system. Anal- 
was also made comparing independent ratings with ratings adjusted 
nference discussion. Differences in agreement or consistency of 
th which labor and management committee members rate jobs - 
o explored. 
following conclusions are supported: 


The abbreviated system demonstrates reliability sufficiently high 


t practical purposes (.98 for five raters). i 
dy shows the same rank- 


e individual scales. 


: ich are of lowest magnitude and those of management representatives 


166 C. H. Lawshe and Patrick C. Farbro 


4, High correlation was found between mean initial ratings (those 
assigned independently) and the mode of adjusted ratings (adjusted 
during conference discussion) for Total Points (.97) and also for the 
component items of the system (ranging from .83 to .92). 

5. Initial ratings as conceived by management, labor, and the super- 
intendents separately when compared with the mode of adjusted ratings 
showed the superintendents to initially rate jobs more accurately as 
evidenced by correlation with adjusted ratings than did management 
committee members or labor union committee members. Management 
members’ initial ratings agreed more closely with final ratings than did 
labor union members’ initial ratings. The same relationship was found 
in analyzing actual point changes from initial to mode of adjusted ratings. 

6. Throughout this analysis the Skill Demands factor as measured by 
“Learning Period” and “General Schooling” items was found the most 
stable as evidenced by the fact that average intercorrelations for these 
two scales were largest; that agreement between management and labor 
members on these two scales was greater than on other scales; and that 
correlations between initial and adjusted ratings on these two scales was 
higher than on scales of “Job Characteristics” factor. 

Received December 22, 1948. 
Early publication. 


References 


1. Lawshe, C. H., Jr., and Satter, G. A. Studies in job evaluation. 1. Factor analysis 
of point ratings for hourly paid jobs in three industrial plants. J. appl. Psychol., 
1944, 28, 189-198. 

2. Lawshe, C. H., Jr. Studies in job evaluation. 2. The adequacy of abbreviated 
point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 
1945, 29, 177-184. 

3. Lawshe, C. H., Jr. Studies in job evaluation, 3. An analysis of point ratings for 
salary paid jobs in an industrial plant. J. appl. Psychol., 1946, 30, 117-128. 

4. Lawshe, C. H., Jr, and Alessi, S. L. Studies in job evaluation. 4. Analysis of 
another point rating scale for hourly-paid jobs and the adequacy of an abbre- 
viated scale. J. appl. Psychol., 1946, 30, 310-319. 

5. aane, C. H., Jr., and Wilson, R. F. Studies in job evaluation. 5. An analysis 
Date comparison system as it functions in a paper mill. J. appl. Psychol., 

6. Lawshe, ©. H., Jr., and Wilson, R. F. Studies in job evaluation. 6. The relia- 
bility of two point rating systems. J. appl. Psychol., 1947, 31, 355-365. 

7. ieee a Hie Dake Edmund E., and Wilson, R, F. Studies in job evel 

- 0, T lysis of two point rati job evaluation. J- 
appl. Psychol., 1948, 32, 118-129, imac 

8. Peters, C. C., and Van Voorhis, W, R. Statistical procedures and their mathematical 
bases. New York: McGraw-Hill Book Co., 1940. 

9. Snedecor, G. W. Statistical methods. Ames, Iowa: Iowa State College Press, 1946. 


' 3 Odor Selection, Preferences and Identification 


Bernard Locke and Charles H. Grimm 
Brooklyn, N. Y. 


light of the fact that many millions of dollars are spent annually 
purchase of aromatic products it is extremely surprising that so 
k has been done in any systematic fashion to evaluate some of 
s which lead an individual to select a particular aromatic 
und for purchase. It is the purpose of this paper to explore, in a 
y fashion, several of the factors which might play a part in 
ction. 
broad elements to be dealt with in this research include: 1. The 
to differentiate between “expensive” and “inexpensive” odors. 
relationship between subjective concepts of costliness and “pleas- 
s” or “unpleasantness” of a perfume compound. 3. The ability to 
e some of the more common floral odors. 
69 female subjects used were a select rather than a cross section 
ling in that they were students in an advanced collegiate course in 
logy and our interpretations of the results will, therefore, take 
into consideration. The average age of the group was 24.7 years 
‘a range from 19 to 50 years. The length of time that these indi- 
s had been using perfumes ranged from one to twenty-five years 
a mean of 7.2 years. 


Experiment 1. The Ability to Differentiate Between 
“Expensive” and “Inexpensive” Odors 


search of the psychological literature for the past five years reveals 
one experimental exploration ofthe ability of individuals to dif- 
te between expensive and inexpensive perfumes. In this ex- 
nt G. M. Jewett! employed three pairs of perfumes each containing 
nexpensive member (50¢ an ounce) and an expensive one ($8.00 to 
00 per ounce). His subjects were asked to compare them as to 
“desirability” or affect and “lasting quality” purely on the 
of the smell stimulus. Jewett concluded from his data that in 
Tespects the inexpensive perfumes produced substantially the same 
as the expensive. 

lewett, G. M. A note on the relation between subjective estimates of the desira- 
and the lasting quality of certain perfumes and their cost. J. gen. Psychol., 


33, 285-290. 
167 


168 Bernard Locke and Charles H. Grimm 


In the present experiment the 69 subjects were individually given 
perfumers’ blotters that had been dipped into standard strength samples 
(16 oz. of oil to 128 oz. of alcohol) of eight perfumes and asked to indicate 
on a check sheet whether they thought the perfume to be an expensive 
or inexpensive one and at the same time whether they thought it a 
pleasant or unpleasant one. A description of the perfume oils, odor 
types and their costs is as follows. 


Each of the oils has been found to be commercially acceptable and has 
been in use for a period of years. The average cost of the inexpensive oils 
(Numbers 1, 3, 5 and 7) is $5.00 per pound and the average cost of the expen- 
sive compounds (Numbers 2, 4, 6 and 8) is $60.00 per pound. The floral odors 
used were selected for their high fidelity in reproducing the actual floral note 
demonstrated in many years of use. Odor No. 1. i savy sweet, balsamic, 
amber type; 2. A subtle chypre-floral, French, modern bouquet; 3. A modern, 
sweet, resin, aldehyde-chypre type; 4. A modern, floral-spice, fantasy type; 
5. A sweet, modern, trefle, “outdoor” type; 6. A sophisticated, aldehyde- 
floral, fantasy type; 7. A modern, aldehyde, French type; and 8. A heavy, 
sweet, balsamic, amber type. 


Table 1 
Subjective Estimates of Cost of Eight Perfume Samples 
Note: Items Marked with a * Are the “Expensive” Compounds 


Perfume Per Cent of 

No. Inexpensive Expensive Correct Responses 
1 49 20 71 

2* 26 43 62 

3 41 28 59 

Ea 44 25 36 

5 41 28 59 

6* , 36 33 48 

7 44 25 64 

8* 39 30 43 


Table 1 presents the selections. The range of correct estimations of 
cost runs from 36 per cent to 71 per cent. If the responses for all eight 
odors Are averaged the mean percentage of correct responses is 55, of 
just slightly better than if the selections had been made purely by chance. 
However, if We consider the accuracy of the judgments as regards the 
expensive and inexpensive odors separately we find that 63.3 per cent of 
the subjects made accurate choices of the inexpensive odors as compared 
eo see Per cent correct choices for the expensive odors. The computed 
critical ratio is 2.56 indicating that the difference is significant at the 
2 per cent level but not at the 1 per cent level. 

If one considers the direction of the errors made it is found that in 


Odor Selection, Preferences and Identification 169 


‘cent of the estimations inexpensive perfume compounds were 
as “expensive” while 53 per cent of the estimations of the ex- 
compounds categorized them as being “inexpensive.” Thus, 
e a distinct tendency to minimize rather than to exaggerate the 
of the odor samplings. j 
mean number of correct identifications as to relative costliness 
ght perfume samples was 4.4. Not one of the 69 individuals 
e to classify all eight correctly nor did any individual fail to make 
correct choice. 
[n order to determine whether length of use plays any part in devel- 
ill in differentiation between expensive and inexpensive odors 
was divided into those who had used perfume from 0 to 5 years 
and those who had been using it for 6 or more years (N = 37). 
son of the number of correct selections of the members of these 
ips reveals that there is no demonstrable improvement in ability 
entiate between the expensive and inexpensive odors with in- 
ig numbers of years of perfume usage. This is best demonstrated 
fact that the average number of appropriate selections for both 
groups is exactly identical, namely, 4.4 correct. 
er to evaluate the role of frequency of use as opposed to length of 
me in developing the ability to differentiate between expensive 
nsive perfumes the subjects were asked to indicate the fre- 
with which they used perfumes. This was done on a four point 
hich was made up of the following steps: Frequently, Occasion- 
ely, Not at all. The need for such an evaluation is best illus- 
by the response of the oldest member of the group who, in reporting 
ber of years that she had used perfume, replied, “Once a year 
ty-five years.” Because of the small size of the experimental 
ip the one subject who fell in the “not at all” category, and who, 
ly, made 5 correct selections, has been thrown into the “rarely” 
The results indicate that for the present experimental sample 
is no measurable difference in ability to discriminate expensive 
‘inexpensive perfumes among individuals who use perfumes fre- 
itly, occasionally or rarely, the mean number of correct choices being 
j and 4.5 respectively. i 


ment 2. Relationship Between Subjective Concepts of Costliness 
“Pleasantness” or “Unpleasantness” of a Perfume Compound 


it is fairly common experience that with some individuals com- 
can be “costly” and still “unpleasant” and vice versa it was 
to explore the frequency with which such variations occurred. 


170 Bernard Locke and Charles H. Grimm 


At the time that each of the subjects determined whether a sample was 
expensive or inexpensive she was also asked to indicate whether the odor 
was pleasing or unpleasant to her. Table 2 presents the frequency with 
which each of the eight odors used in Experiment 1 was designated with 
the apparently contradictory adjectives “Inexpensive and Pleasant” or 
“Expensive and Unpleasant.” From this table we note that a consider- 
able amount of disagreement exists between the individual’s evaluation 
of the cost of each of the perfumes and its pleasantness. This difference 
actually constitutes an average of 31.5 per cent-or, virtually, one-third 
of the total number of comparisons made. When the discrepancies for 
the “expensive” and “inexpensive” groups of perfumes are compared no 
difference is found. The mean percentage of differences is 31.8 per cent 
for the inexpensive odors and 31.3 per cent for the expensive group. 
While there was a slightly greater tendency to attribute unpleasantness 
to odors thought to be costly than to consider as pleasant those com- 
pounds which were thought to be inexpensive, this difference is not 
sufficiently great to be significant. 


Table 2 


Differences in Subjective Concepts of Costliness and Pleasantness or 
Unpleasantness of 8 Perfume Compounds 


Note: Those Perfumes Marked with a * Are Expensive 


Perfume Inexpensive- Expensive- Total Total 

No. Pleasant OAnledaant Disagreement Agreement 
1 13 6 19 50 
2- 15 11 26 43 

3 10 14 24 45 

aa 14 9 23 46 

5 12 14 26 43 

3 9 10 19 50 

7 5 14 19 50 

8* 3 16 19 50 


In considering the number of instances in which there was disagree- 
ment between the concepts of costliness and pleasantness for each of the 
individuals we learn again of the disagreement in attitudes between cost 
and pleasantness, The mean number of disagreements for each of the 
individuals in terms of pairing inexpensiveness and pleasantness is 1.2 
and the mean for the expensive-unpleasant pair is 1.5. In one instance 
where the subject classed all of the perfumes as expensive, she also con- 
sidered them all as being unpleasant. 


Odor Selection, Preferences and Identification 171 


eriment 3. An Investigation of the Ability of a Group of Subjects 
to Recognize Some of the More Common Floral Odors 


s section of the research was intended to examine the ability of the 
experimental group to identify some of the more common floral 
The eight odors used were Lilac, Gardenia, Carnation, Rose, 
jasmin, Lily of the Valley and Geranium presented in that order. 
subject was permitted to smell each of the odors on perfumers’ 
ers after having been told that each of the odors that she would now 
was that of a flower and that she was to identify it by name. Table 
he number of correct identifications of each of the eight odors. 


Table 3 
Correct Identifications of Eight Floral Odors 


Number of Correct 
Identifications Correct Identification 
in Percentages 


Lilac 


Gardenia 23 33 
Carnation 21 30 
Rose 17 25 
Pine 28 41 
Jasmin 1 1 
ea Lily of the Valley 16 23 


Geranium 0 


amination of Table 3 shows that the range of correct identifications 
e floral odors used ranges from 0 (for gernaium) to 28 (for pine) or 
0 to 41 per cent of correct‘ identifications. If one averages the 
et responses for all of the eight odors the resultant percentage of 
ct responses is 23.5. ‘The apparent order of difficulty in identifica- 
anging from most difficult to least difficult is Gernaium, Jasmin, Lily 
e Valley, Rose, Carnation, Gardenia, Lilac and Pine. While it is 
what surprising that so much difficulty was evidenced in identifying 
arious odors it is particularly interesting that the rose which is so 
n and popular to our culture caused so much difficulty in recog- 
| with only one out of every four subjects being able to identify it 


able 4 presents the findings for the number of correct identifications 
each member of our experimental group. This table reveals that 12 
cent of our subjects were unable to identify even one of the floral 
‘used and, similarly, there was no individual who identified more 
an four of the eight floral odors that we used. 


172 Bernard Locke and Charles H. Grimm 


-+ Table 4 
Correct Identifications of the Series of Eight Floral Odors 


nper of Correct Beas of Individuals Per Cent of 


dentifications = 69) Total Group 

0 8 12 
1 22 31 
2 17 25 
3 16 23 
4 6 9 
5 0 0 
6 0 0 
7 0 0 
8 0 0 

Mean = 1.8 100 


To illustrate the wide deviations in identification made by members 
of the group Table 5 presents the identities attributed to our samples of 
Rose and Carnation. 

In order to determine the effect of knowledge of the identity of the 
floral odors under investigation upon the accuracy of the identifications 
one-half of the experimental group (35 subjects) was asked to repeat 
this portion of the experiment but this time they were given the names 
of ies odors presented in random order. Table 6 presents a comparison 


Table 5 
Identifications of Rose and Carnation Samples made by the 69 Subjects 


Rose | Carnation 
eee N O an 


Don’t Know 27 Don’t Know 24 
Rose 17 Carnation al 
Lily, oh of the Valley, Gardenia 5 

Easter Lily 5 Geranium 4 
Gardenia 4 Jasmin 3 
Lilac 3 Spice 3 

_ Sweet Pea 3 nee 2 
Jasmin 2 Orange Blossom 2 
Bouquet 2 1 
Cold C: i Serart i 
Baby’s Breath 1 Tirade f 

1 Musk Blossom i; 

Lemon Verbena 1 Clover 1 
Geranium 1 
Carnation 1 

Pe ' 


— 


Odor Selection, Preferences and Identification 173 


Table 6 
‘ison of Accuracy of Identification of Floral Odors with and without 
Knowledge of Their Identities 
Correct: Repons Correct Responses 
with Knowledge without Knowledge 
of Identities of Identities 
57% 35% 
46% 33% 
54% 80% 
23% 25% 
94% 41% 
20% 1% 
40% 23% 
20% 0% 
Mean 44% Mean 23% 


7 
e findings for this group and the original group. Examination of 
table reveals a rather marked improvement in identifications in all 
lors except rose and this for some undetermined reason shows a 
decline. The average improvement for the eight odors combined 
per cent but the range is wide since it runs from—2 per cent (for 
to +53 per cent (for Pine). 

one considers the contrast between the number of correct 
ns made by each of the subjects before and after the identities of 

odors were given, one finds that while the mean number of 
responses has advanced from 1.8 to 3.5, there is still considerable 
‘or improvement. It is interesting to note that while none of the 
ects was able to identify more than four of the odors prior to their 
s having been made known, 10 of the 35 subjects were able to do 
the list was made available. Two of the 35 were able to identify 
samples correctly. 


Summary and Conclusions 


oying 69 college students as the experimental group an attempt 
n made to evaluate some of the factors that play a part in odor 
ces and identifications. The results obtained are not intended 
licate universal trends, since a select group was used, but they do 
i o the need for further investigation in this area. 

‘or the experimental group used the ability to recognize the 
between expensive and inexpensive perfume compounds was 
tly better than chance, with the mean percentage of correct 
being 55. : 


T d 


174 Bernard Locke and Charles H. Grimm 


2. There was a greater tendency to select expensive perfumes as being 
inexpensive than vice versa. 

3. Length of use of perfumes apparently does not affect the ability 
to make accurate judgments as to the costliness of perfume compounds. 

4. Frequency of use does not affect the ability to make accurate 
judgments as to the costliness of perfume compounds. 

5. There is considerable disagreement between the individual’s evalua- 
tion of the cost of a perfume and its “pleasantness.” There was a slightly 
greater tendency to attribute unpleasantness to odors thought to be 
costly than to consider as pleasant those compounds which were thought 
to be inexpensive. 

6. Utilizing eight common floral odors it was found that our experi- 
mental group was able to identify them with less than 25 per cent accuracy 
(23.5 per cent correct). 

7. When 35 subjects were informed as to what eight floral odors were 
being utilized their accuracy in identification rose to but 44 per cent. 
Received October 25, 1948. 

Early publication. 


ediction of Female Readership of Magazine Articles * 


Evelyn Perloff 
Ohio State University 


is the second of two studies attempting to predict the number 
dividuals that will read a magazine article, prior to its publication. 
first study discussed the prediction of male readership of articles in 
rday Evening Post. The purpose of the current study was to 
ne the way in which five variables combined for maximum female 
rship of articles in the Post. The reader who desires complete 


he readership results of men and women were handled separately 
he assumption that interest patterns for magazine articles are well 
ed according to sex. A comparison of the readership figures of Post 
les for males and females in these studies and many others will clearly 
te the varying interests and preferences of the two sexes.” 

i uch as starting readership is based upon information obtained 
m individual reports from respondents, it is essential to have some 
e of the accuracy of these reports. Ludeke and Inglis compared 
results of what readers of the Ladies’ Home Journal stated they had 
with what they were observed to have read. The results of this 
formative experiment showed an average difference of 1.7% between 
two conditions, which seems to justify the conclusion that “reported 
behavior did not differ materially from active reading behavior.” 3 
likely that similar results would be obtained with The Saturday 
ing Post. At present, the reliability of the criterion probably lies 
in an error range of 8% (2c value). 


* This study was conducted while the writer was a research associate in the Develop- 
Division of the Research Department, Curtis Publishing Company. 

1 Perloff, E. Prediction of male readership of magazine articles. J. appl. Psychol., 

32, 663-674. y | 

ples, D., and Tyler, R. W. What people want to read about. Chicago: Univer- 

Chicago Press, 1931, and unpublished studies, The Curtis Publishing Company, 

elphia, Pa. 

Ludeke, H. C., and Inglis, R. A. A technique for validating interviewing methods 

ch. Sociometry, 1942, 5, 109-122. 

nkenship, A. B. Consumer and opinion research. New York: Harper and 

, 1943, Appendix, Table 2. 


175 


176 Evelyn Perloff 
Results 


The five variables used were number of illustrations, color of illustra- 
tions, sea of persons in the illustrations, proportion of opening page(s) 
devoted to text, and subject matter of the article. The findings will be 
presented in three sections: (1) The Distributions, (2) The Determination 
of the Composite Effect, and (3) The Cross-validation. 

The Distributions. All starting readership per cents are indexes and 
not actual figures. 

The relationship (r = .35) of number of illustrations to starting reader- 
ship per cent indicated on the face of it that the number of illustrations 
significantly influenced the female reader in starting an article. It was 
apparent that there were no clear-cut breaks in the distributions, as was 
present in the study on male readership. Although there was a slight 
upward trend in starting readership from articles having no illustrations 
to those having eight or more, this trend was not very distinct. It was 
clear, however, that female Post readers preferred articles with many 
illustrations as compared to those with no illustrations. Both men and 
Women were equally influenced by this variable (r’s = .35) when its 
effect on starting readership was determined, but all other variables 
were permitted to vary. i 

There appeared to be a clear-cut relationship (r = .42) between the 
color of illustrations and starting the article. There were two definite 
breaks for the four categories in this variable. Thus, there were sharp 
changes from “other” to the two categories, black and white and duotone; 
and from black and white and duotone to full-color. It was clear that 
the women in this study did not differentiate between black and white 
and duotone but keenly preferred articles having full-color illustrations. 
The color of illustrations seemed to be of greater importance to women 
(r = 42) than to men (r = .28) in influencing them to start reading 
articles in The Saturday Evening Post. 

Ane relationship (r = 38) between sex of persons in the illustrations 
and starting readership also appeared to influence significantly the 
starting readership of Post articles by women readers, Apparently, the 
woman reader of the Post preferred any type of illustration other than 
that showing only men. The woman reader preferred illustrations in- 
cluding both males and females to illustrations including females alone, 
pon E R was slight, Again, female readers seemed to be more 

uen y sex of persons in illustrations than male readers (r = .22). 

There appeared to be a significant inverse relationship (r = — 36) 
between proportion of pening page(s) devoted to text and how many 
women would start to read an article. It was apparent that devoting 


i 


Female Readership of Magazine Articles 177 


an 20 per cent of the opening page(s) to text resulted in the highest 
greadership. The differences among the three classes (i.e. classes 
rms of amount of space devoted to text) were clear-cut and more 
istinct than in the male readership study. The general trend was for 
readership to improve as the per cent of text on the opening 
) decreased. 

Te was greater variation among the classes of the subject matter 
le than of any other. A number of the categories had too few 
to merit consideration as a separate class. This eliminated various 
which are part of the gamut of subjects upon which Post articles 
itten. These articles were classified under the category, “Other.” 
s found, however, after completion of the male readership study, 
the category, “Other,” could be further broken down into eight 
onal subject matter categories, making a total of 24 classes in the 


this revision, the correlation coefficient between subject matter 
starting readership per cent was raised from .46 to .60, both coeffi- 
ts indicating clearly that the subject matter of an article considerably 
enced the female reader to start it. It was apparent that the women 
Well as the men) who read The Saturday Evening Post have definite 
and dislikes of Post topics. Although there was a steady increase 
ing readership from topics least liked to those best liked, there 
so several sharp changes grouping together both similar levels of 
ferences and similar kinds of subject matter. The general trend was 
r female starting readership to improve significantly when Post articles 
ilt with topics such as people at work, descriptions of peoples and 
es (USA), and health and hygiene. These topics revealed a pref- 
ce by female readers for human-interest articles. Action-type 
icles such as those on sports, athletes, and labor, which rated among 
highest with male Post readers, offered less attraction to the women 
ers. 
The Determination of the Composite Effect. The correlation matrix 
wn in Table 1. The horizontal and vertical headings indicate the 
variables used in the study. Nwmber of illustrations gave the lowest 
ation (r = .35) with starting readership per cent, while the coeffi- 
t between subject matter and starting readership was the highest 


Each of the five variables correlated higher with the criterion (starting 
ip) for female readers of the Post than for male readers. The 
is unable to say at this time whether this fact suggests that women, 


178 Evelyn Perloff 


Table 1 
Intercorrelations Between Variables 1-5 and of Starting Readership Per Cent 
N = 190) 
Starting No. Color Sex %Texton 
Reader- of of of Opening Subject 
Variable ship % Illus. Illus. Persons Page(s) Matter 
Starting 
Readership % — 85 A2 38 —.36 60 
No. of Illus. 35 = 87 ll —.29 -19 
Color of Illus, 42 67 _ 15 — 46 33 
Sex of Persons 38 11 15 — —.16 25 
Per Cent Text on 
Opening Page(s) —.36 —.29 — 46 —.16 — —.25 


Subject Matter 60 19 33 25 —.25 


by and large, are more impressionable than men, or more consistent in 
their interests, or were more greatly influenced by the particular variables 
used in the study, or whether this increased relationship (over male 
readership) resulted from the author’s coding. It is probably safe, how- 
ever, to conclude from the data in both this and the male study that 
there is a significant sex difference in the readership habits of Post articles. 

For prediction purposes the regression equation was computed. 
Table 2 shows the weights that each variable obtained. These weights 
are an approximation of the relative independent value of each variable 
to the success of the article (starting readership per cent). Use of this 
regression equation yielded an R of .70. The standard error of estimate 
was 9.7 per cent. Hence, the chances are that in about 68 out of 100 
cases the predicted starting readership per cents will be within an error 
of 10 points or less. We may be certain that very few starting readership 
estimates will be in error by more than 30 per cent. 

Calculation of the coded score weights (weights dependent upon the 


Table 2 


Weights of Five Variables for Predicting Starting Readership Per Cent 
(N = 190) (R =.70) 


Variables Weight 
Subject Matter ‘ 45 
Sex of Persons in Illustrations 22 
Number of Illustrations 15 
Proportion of Opening Page(s) Devoted to Text —.14 
Color of Illustrations 07 


See IAN 


Female Readership of Magazine Articles 179 


ing scale of the specific variable) gave the necessary data for the 
equation. The final equation is as follows: 


licted Starting Readership Per Cent Index 
= = 14.9 (Index) + 1.9 X class value (No. of Illus.) 
+. 2.3 X class value (Color of Illus.) 
_ +6.1 X class value (Sex of Persons in Illus.) — 3.1 X class value 
(Proportion of Opening Page[s] Devoted to Text) + 4.2 X class 
value (Subject Matter). 


smuch as the correlation coefficients of four variables (not including 
ect matter) and the resulting multiple, .70, could be higher for pre- 
ve purposes, it is believed that there are other variables which pos- 
‘are of greater importance for prediction than the ones under present 
This is particularly evident from the fact that the multiple 
lation coefficient was raised only .10 when these four variables were 
dered along with the subject matter variable. In view, however, 
paucity of information (i.e. other variables) and perhaps the 
culty of measuring them, consideration of the present variables, in- 
ually but more requisitely all together, can make noticeable im- 
ovement in predicting starting readership. 
‘The Cross-validation. To determine the extent to which the weights 
the characteristics of articles would be valuable in years other than the 
year 1946, when the articles included in this study appeared, we have 
pplied this regression equation to 149 articles appearing in the 1947 
issues of the Post. The correlation between the actual and predicted 
ng readership per cents was .73. This validity coefficient was 
slightly higher than the multiple (R = .70) and a reversal of the lower 
lidity coefficient (r = .36) and the multiple (R = .56) obtained in 
the male readership study. 
The higher correlation predictions for women readers in this later 
‘probably result from a constancy in interests over the period of the 
intervening. The increased number of classes in the subject matter 
ble may also account for the slightly higher validity correlation 
cient. 
The average difference between the actual and predicted starting 
hip per cents was 8.6 per cent. The predicted starting reader- 
ip per cents were within 5 per cent of the actual starting readership in 
cent of the articles, within 10 per cent in 68 per cent of the articles, 
d within 15 per cent in 86 per cent of the articles. 


The Applications 


The applications of this study are identical to those discussed in the 
idy on male readership. The primary application lies in checking the 


180 Evelyn Perloff 


value of a tentative layout for an article and making such layout changes 
as are necessary to increase the average readership of each issue of the 
magazine. Inasmuch as weights may change with time, continued 
follow-up is essential. 


Conclusions 
The following conclusions are supported: 


1. The multiple correlation and regression technique proved to be a 
successful method for predicting starting readership of Post articles by 
female readers. 

2. The accuracy of the predictions of future articles should fall within 
a 10 per cent difference between predicted and actual starting readership 
per cents in about 68 per cent of the cases. This percentage error is 
satisfactory for most practical purposes. 

8. The order of the relative importance of the five variables included 
in this study is (a) subject matter; (b) sex of persons in illustrations; (c) 
number of illustrations; (d) proportion of opening page(s) devoted to text; 
and (e) color of illustrations. 


Received August 31, 1948. 


Special Review 


Krisen. The Third Mental Measurements Yearbook. New 
vick, N. J.: Rutgers University Press, 1949. Pp. xv, 1047. 


is the word for this sixth and latest offering in the familiar 
ibliographical works on mental measurements edited by Buros. 
ith a modest 44-page listing of tests in 1935, the phenomenal 
the series is charted in the following healthy chronology: 


935—Educational, Psychological, and Personality tests of 1933 
and 1934—44 pages 

Educational, Psychological, and Personality Tests of 1933, 
1934, and 1935—83 pages 

937—Educational, Psychological, and Personality Tests of 1936— 
_ 141 pages 

988—The Nineteen Thirty-Eight Mental Measurements Year- 
book—415 pages 

_ 1941—The Nineteen Forty Mental Measurements Yearbook— 
; 674 pages 

1949—The Third Mental Measurements Yearbook—1047 pages 


f 


ated, the rate of increase in volume has been tremendous and as 
OWS no clear change in trend. One is reminded of the fable of the 
er's apprentice; and one hopes that Buros has his wonderful 
legerdemain under better control. 

Third Yearbook follows the familiar pattern of the earlier models. 
are two main sections: “Tests and Reviews” and “Books and 
The first of these, comprising over two-thirds of the volume, 
section for which the Yearbooks are best known, and on which 
tation is founded. The comparative statistics for the “Tests 
ews” sections of the three Yearbooks are shown in the accom- 
table. 


= 


Comparative Statistics of the Three Mental Measurements Yearbooks 
1938 1940 Third 
Jan. 1937-June 1938 July 1938-Oct. 1940 Oct. 1940-Dee. 1947 
313 503 705 
331 503 713 
133 250 320 


181 


182 Special Review 


It was originally planned that new editions of the Yearbook would 
be issued at two-year intervals, but this schedule was interrupted by the 
war. The first post-war model, therefore, covers a period of some seven 
years—a fact which, in part at least, accounts for its gargantuan dimen- 
sions and excuses its minor sins of omission. The “Tests and Reviews” 
section of the Third Yearbook lists 663 tests (plus 42 references to books 
about single tests, e.g. the Rorschach) of which about 70 per cent are 
accompanied by one or more original reviews. Altogether, 713 reviews 
are contributed by 320 psychologists, educationists, subject-matter ex- 
perts, classroom teachers, and test technicians. Included also are 66 
excerpts from reviews which have already appeared eleswhere and (as 
claimed in the preface—this reviewer did not count them) ‘3,368 refer- 
ences on the construction, validity, use, and limitations of specific tests.” 

The tests listed and reviewed purport to be all of the “commercially 
available tests—educational, psychological, and vocational—published 
as separates in English-speaking countries between October 1940 and 
December 1947.” Tn addition are included a selected list of “classics” 
(e.g. the Army Alpha, Stenquist Mechanical Aptitude, Strong Vocational 
Interest Inventory, etc.) plus a few tests published during the 15 years 
since Buros first started fishing in these waters, but which somehow “got 
away” before. i 

For each test entry, the following useful information is provided, all 
condensed into a few lines: test title; description of groups for which 
intended; date of publication, copyright, or revision; whether or not 
machine scorable; whether individual or group test; forms, parts, and 
levels available; cost; testing time and total administration time; author; 
and publisher. A specimen entry is: 

American Counci i i inati High 
Seo Baena, Gada a ISen] Examination for, Figh 
i Separate answer sheets must be used; $2.00 per 25 tests; 50¢ per 


25 machine-scorable answer sheets; 50¢ per specimen set; 35 (65) minutes; 
L. L. Thurstone and Thelma Gwinn *Thurstone; Educational Testing 


Service, 

F ollowing this outline, for most test entries, are cross references tO 
earlier Yearbooks or bibliographies in the series, and references to books 
and articles covering some aspect of the test. Then comes the feature 
which the series is best known—the original review. About a third of 
the entries include reviews by more than one contributor. But more 
about the reviews later, 

The second main section of the volume—the “Books and Reviews” 
section—lists “549 books on measurements and closely related fields,” 
accompanied in most cases by excerpts fromfreviews of these books 
culled from the journals. Here again the attempt has been to include 


Special Review 183 


er 1940 and December 1947. And here, as throughout the book, 
The editor’s preface states in 


‘Reviews which included no critical comment are listed but not excerpted. 
Readers should note that the critical portions of all book reviews, regard- 
less of merit, found in epimers! and scholarly journals are included in 

this yearbook. Asterisks and ellipses within excerpts indicate the omission 


non-evaluative material which appeared in the original review. . . .” 


the selection of this material was presumably the sole responsibility 
editor. Inclusion of all the words written about all of the books 
‘would obviously have been both impossible and ridiculous. On 
ther hand, wherever judgment must be employed in the selection and 
ig of material, one is entitled to ask questions about the basis of the 
(on, and to be suspicious of possible conscious or unconscious bias. 
case, the editor assures us that all the critical appraisals of the 
listed, collected from all the reviews of these books appearing in 
professional and scholarly journals, have been included, and only 
purely descriptive or non-evaluative reviews or parts of reviews have 
left out. This being true, the reader might conclude that Buros’ 
n Nineteen Forty Mental Measurements Yearbook, since it accounts for 
Teviews in 1514 pages, was either the most important or the most 
iticized—surely the most controversial—book on the subject published 
1941. Sharers of the honor of evoking the most critical comment 
uld be the two volumes of Diagnostic Psychological Testing by Rapaport 
(13 reviews, 1614 pages) and Stoddard’s The Meaning of Intelligence 
reviews, 10 pages). 
n addition to the two major review sections covering tests and books, 
volume contains five directories and indices. The first of these, 
ical Directory and Indez, serves both as a key to the abbreviations 
throughout in journal references and as a directory of journal editors. 
e second, Publishers Directory and Index, gives addresses of test and 
ok publishers. The Index of Titles and the Index of Names are con- 
ntional alphabetical listings. Finally, the Classified Index of Tests is 
expanded table of contents for the “Tests and Reviews” section, 
i. g each entry numerically (1-705). 
The reputation of the earlier Yearbooks derived principally from the 
and Reviews” sections; and the same will doubtless be true of the 
t volume in the series. The major rubrics are essentially the same 
hose employed in the earlier issues; Achievement Batteries (22 
tries); Character and Personality (91 entries); English (57 entries); 
‘Arts (7 entries); Foreign Languages (36 entries); Intelligence (89 


184 Special Review 


entries); Mathematics (62 entries); Miscellaneous, e.g. home economics, 
safety, computational and scoring devices (111 entries); Reading (70 
entries); Science (44 entries); Social Studies (30 entries); and Vocations 
(86 entries). It is difficult to know what significance, if any, to attach to 
the numbers of entires in each category. Perhaps they illustrate the 
difficulties of fitting fabricated classifications to any given series of data, 
particularly when the data present themselves without regard to the 
principles on which the classification was originally compounded. When 
this occurs, the pattern must be stretched here and there, and the miscel- 
laneous section originally provided for overflow inevitably grows bigger 
and bigger. In spite of this—and assuming the listings are as compre- 
hensive as claimed—it appears that the war and post-war periods have 
provided an atmosphere more congenial to production in the field of 
character and personality than elsewhere. This conclusion might be 
somewhat misleading, however, since one test alone accounts for nearly 
a quarter of all the entries under this heading. Needless to say, it is the 
Rorschach which somehow merits 67 pages, including a bibliography of 
598 titles! 
The original reviews themselves appear to this reviewer as a varied 
lot having only one factor in common—all are critical. Criticism is, in 
fact, the dominant tone of the whole volume (nil nisi bonum is definitely 
not the editor’s watchword!) and while going through it page by page 
one may conjure up an image of the editor at the head of his ranks of 
contributors daring the would-be test maker to attempt to get away 
with anything shoddy or unscrupulous. The image is an inspiring one, 
and, though fanciful, not too remote from the editor’s intention. One 
of the major objectives of the Yearbooks, in fact, is: 


“To impel authors and publishers to place fewer but better tests on the 

_ market and to provide test users with detailed and accurate information 
on the construction, validation, uses, and limitations of their tests at the 
time that they are first placed on the market.” 


To achieve this objective, the editor has instructed his cooperating 
reviewers to provide reviews that are “. . . frankly critical with both 
strengths and weaknesses pointed out in a judicious manner.” Just 
how “judicious” are such randomly selected remarks as: 


“This is just another test for neuroti i i n 
i : c tendencies. The reviewer ¢a 
eens reason for its publication or use. . . . The only excuse for 
pu g another test of neurotic tendency in this day and age is 10- 


oe Jalidity over other tests in the field. This test is grossly lacking 


With the perspective attained in the years since its publication . . - 02€ 


may view (test maker’s) arrant nonsense with tolerant amusement.” 


Special Review 185 


he instrument is a reversion to a type of psychological and sensory 
ng that belongs to the infancy of mental measurement, and has 
epeatedly been proved worthless as an index to higher mental ability.” 
der must judge for himself. If such candid criticisms are indeed 
ted, the reviewers deserve full praise for their courage is saying 
- Fortunately, a good many of the reviewers have displayed this kind 
bright frankness. Certainly, this reviewer is not advocating 
jus brutality for the sheer sadistic enjoyment of contemplating the 
mfiture brought about by a well-placed literary needle. But it is 
it to discern what value might accrue to the potential test user from 
of review which finds a little to praise and a little: to blame in 
t and sums up with an equivocal statement of possible usefulness 
ch for intuitive hunches. Fortunaltely, this sort of review is 
d very often in the Yearbook. 
t while the general tone of the book is healthily evaluative, the 
of criticism varies considerably among the reviews. They might 
classified under headings suggested in an excellent review of the 
J0 Yearbook (Pedro T. Orata in The Teachers College Journal (Manila), 


‘niques of test co 
and tabulation of scores to the higher values that thi 
hould engender in the pupils and in those who use it. 
. . . « criticizes the tesi s 
ments of statistical validity and reliability, t 
commonly accepted techniques of test construction, 
assumes functional validity or su 
make-up of the test. 4 i 

om the point of view of its success or failure 


. . . evaluates it mainly fr e A a ra 
to meet the mechanical requirements of efficiency in scoring, adminis- 


tration, and tabulation of test results. 
of view has merit, to be sure, and each could 
orters to its side. In fact, 


butors to the volume are experts and qualified to speak and be 
d, they are not all equally sensitized to all phases of psychometry- 
iagnosing human ailments, we don’t call in only the internist or the 
rologist. Nor should we, in examining the test, expect the subject- 

expert to detect statistical ailments, or the psychometrist to point 
an undernourished teaching objective. We should call in all the 


186 Special Review 


specialists and hold a thorough clinical examination on each case. That 
something like this was intended is indicated by the editor’s statement of 
objectives in the 1940 Yearbook: to provide reviews “written by persons 
of outstanding ability representing various viewpoints... .” But in 
calling in the experts, the editor has made the assignments of cases, which 
implies that he already knew the patients’ needs. Though some such 
procedure is a practical necessity in a venture of this kind, it has the 
definite disadvantage that the treatment of the various types of tests is 
apt to be unbalanced. A cursory survey, for example, indicates that 
nearly all the reviewers assigned to the Achievement Batteries are educa- 
tionists, educational researchers or examiners, and that psychologists and 
psychometrists predominate among the reviewers of Intelligence tests. 

After many hours spent in contemplating the somewhat frightening 
aspect of the Third Yearbook, this reviewer found himself musing about 
the practicability of another kind of volume. This “dream” book would 
not attempt to reproduce verbatim the literary efforts of the experts, 
but would edit and cull from them all the essential materials to fill out 
a standard outline. Spared the necessity of literary composition, re- 
viewers could concentrate on specifics, and could handle more tests with 
No greater expenditure of effort. The outline itself would be drawn up by 
a board of outstanding specialists including both test makers and test 
users. It would cover such points as: type of item; sources of items; 
nature of item analysis; descriptions of populations used for item analysis, 
factorial analysis, validation, cross-validation, standardization; judg- 
ments of functional validity; adequacy of “coverage” ; et cetera, et cetera. 
This list is obviously not exhaustive, and many readers may detect a 
Statistical bias in it. It is for precisely this reason that the board of 
experts would be used to insure the inclusion of all important dimensions 
of a test. Finally, there would be the main feature of the book—the 
board of experts’ “seal of approval” for tests which merited adoption 
and use. In this last connection, the reviewer is reminded of the state- 
ment made by Sandiford in commenting on the earlier Educational, 
Psychological, and Personality Tests of 1936 (American Journal of Psy- 
chology, 1938, 51, 200): “. . . Professor Buros’ annual publication 
would be made much more useful if he would mark with a prominent 
star those (tests) which were valid, reliable, and had satisfactory norms. 
Then busy workers could neglect the rest, or if they wasted their money 
on ‘gold bricks,’ the fault would be their own.” This reviewer can think 
of no better way of achieving the objectives of fewer and better tests. 

Personnel Research Section, AGO, E. Donald Sisson 

Department of the Army. 


Book Reviews 


Ahern, Eileen. Survey of personnel practices in unionized offices. Re- 
search report number 13. New York: American Management As- 
“sociation. 1948. Pp 38. $1.50 (non-members, $3.00). 


‘This report consists of twenty frequency tables and accompanying 
relating to practices in unionized offices in matters of union security, 
ies, hours of work, leave of absence, group insurance, seniority, 
ge, grievance adjustment, and other collective bargaining subjects. 
report is based on 50 union contracts believed to be fairly repre- 
ntative of the entire AMA collection of 300 office union contracts. 
e report will be of interest to only a few psychologists. Those 
re concerned with collective bargaining with office unions or those 
wish to compare their practices with those obtained by employees 
gh collective bargaining will find the report of some interest subject 
limitations imposed by a sample of 50 cases and sub-group tabu- 


based on an N ranging from five to eight. 
j C. E. Jurgensen 


Minneapolis Gas Company 
ah 
Achilles, Paul S. Management and the psychologist: A practical guide on 
` psychology for the business executive. Section II, Book 4, Reading 
Course in Executive Technique, Ed. by Carl Heyel. New York: 
_ Funk and Wagnalls Co., 1948. Pp. 64. $1.00. 
"The sub-title is an exact description of the contents of this little book. 
The presentation is concise yet it is surprisingly comprehensive. It is 
" authoritative, readable and accomplishes its purpose in admirable fashion. 
is just the type of book to place in the hands of the business executive 
has never been exposed to formal psychology but who may be 
lous as to just what our discipline is all about. ‘ 
‘Only one minor criticism would appear to be justified. Having 
w d the appetite of the business executive, Achilles might well have 
i ed a short selected annotated bibliography for his guidance in case he 


might desire to pursue further any phase of the subject. 
Donald G. Paterson 


The University of Minnesota 


Linebarger, Paul M. A. Psychological Warfare. Washington, Infantry 
i Journal, 1948. Pp. 259. $3.50. 

) The purpose of this book is to tell a layman audience what psycholog- 
‘ical warfare is and how it is fought. Linebarger is Professor of Asiatic 
187 


188 Book Reviews 


Politics at the Graduate School of Advanced International Studies in 
Washington, D. C. He served in the War Department and in OWI in 
both policy formulation and in field operations. 

The book handles psychological warfare in three parts. Each part 
has three to six chapters. In the first part, Linebarger covers historical 
examples, definitions, limitations and characterizations of national uses of 
psychological warfare in World Wars I and II. The second part is 
devoted to how to analyze and derive military intelligence from propa- 
ganda in order to make an objective appraisal of a given situation in terms 
of psychological warfare. The third phase includes organization, plans, 
operations, and remarks on future problems. 

The strong point of the book is the waggish style. This is exem- 
plified when he pokes fun at the high level policy echelons wherein much 
of the output was classified top-secret and thus removed from usefulness. 
There are seventy excellent figures of propaganda leaflets as well as ten 
organizational charts of various national offices involved in psychological 
warfare. The content is enlivened with descriptions of events such as 
the use of radio-phones in tank warfare to induce Japanese surrenders. 
The three major U. S. lessons from World War II are, he says, that 
atrocity propaganda does not pay, that we have no backlog of trained 
propaganda personnel, and that psychological warfare must be a positive 
function at command level, not a sideline specialty apart from top level 
policy making. 

A weak point of this book is its lack of organization despite the 
promise of the excellent chapter headings. Specific techniques, the root of 
the entire matter from a professional view, are mentioned as the story de- 
velops. They are not consolidated for a comparative analysis of their uses 
and limitations. There is an unusual mixture of the levels of vocabulary. 
Such words as condign, maleficent, and oestrous occur as well as frequent 
references to people going mad with confusion and serious use of Frisco 
for San Francisco. Use of the revised Flesch formulas show readability 
as difficult and style as mildly interesting. Linebarger epitomizes and 
tends to rest content with neatly turned phrases, For example, he makes 
the point that education is to psychological warfare what a glacier is to 
an avalanche. He neglects to show the crucial differences in bias, in 
use of segmental appeals, and in emotional and authoritarian contexts. 
Professional psychologists may wonder if his two page discussion of the 
role of the psychologist in warfare justifies use of “psychological” in the 
title. The location of the eighty illustrations with reference to the text 
might have been improved. 

_ In summary, Linebarger’s book presents “a patchwork of enthu- 
siastic recollection” as he calls it. Although some professional readers 


Book Reviews 189 


disappointed, the fact that it is a lively entry in a relatively 
ped field makes the book worthwhile for his intended audience. 

r Clark L. Hosmer 
Col. U. S. Air Force i 4 


„L. M., and Oden, Melita H. The gifted child grows up: Twenty- 
years’ follow-up of a superior group. Stanford, California: Stan- 
Univ. Press, 1947. Pp. xiv, 448. $6.00. 


stated in the preface, the volume “‘is an over-all report of the work 
h the California group of gifted subjects from 1921 to 1946, the 
T part of it being devoted to a summary of the follow-up data ob- 
ed in 1940 and 1945; at the latter date the average age of the group 
proximately thirty-five years.” 
first six chapters are a resume of the earlier work, reported in 
in two previous monographs. When selected in 1921-3, the 
high-school subjects had an average chronological age of 9.7 
d the 420 high school cases, 15.2 years; I. Q.’s ranged from 135 
with a mean of about 150. It was estimated that the group was 
highest one per cent in ability as measured. Thirty-one per cent 
fathers were professional men; 60% of the homes were rated as 
ior; relatives included many individuals of note. In 1923, thirty- 
c measurements of 59 per cent of the cases showed 
. . . the selected group was slightly superior physi- 
to the various groups used for comparison.” Health histories and 


pared with the average child; puberty tended to be reached a little 
In school 85 per cent were accelerated in grade placement; 
less, tests showed over half to have mastery of subject matter 
les yet further ahead. Interests of the gifted were livelier, more 
e, more intellectual, and somewhat more social than for average 
en. Tests and ratings of character traits also showed superiority 
gifted. A second survey six years later yielded results substantially 
nent with the first findings. 

hapters 7-19 are concerned with follow-ups in 1940 by inquiry forms 
field workers where possible, and by inquiry forms in 1945-6. That 
ration was outstanding is indicated by returns of information from 
cent of all living subjects. Mortality was to date found less than 
general population, physique and health superior, and maladjust- 
t, delinquency and insanity less frequent than in the general popula- 
Of the total group, 70 per cent of the men and 67 per cent of the 
en had by 1945 graduated from college (as compared with 5 per 
‘of the general population); 34 per cent of the men took one or more 


190 Book Reviews 


graduate degrees; academic records were superior, median age of gradua- 
tion was over a year younger than usual. Nevertheless, gifted students 
participated more in extra-curricular activities than the average student. 
Approximately 71 per cent of the men were in professional or superior 
business occupations in 1940, or 5 times as many as for California men in 
general; income was higher than for college graduates in general. Avoca- 
tional interests were diverse and rich. Attitudes were middle-of-the-road. 
More had married and at earlier ages than for college graduates in general, 
but divorce was less than half as frequent; happiness in marriage was 
rated high, and sex adjustment appeared in no way atypical. 

Chapters 20-26 deal with somewhat special problems. Accelerates 
in school were found greatly to excel non-accelerates in the group, in 
achievement on a test battery in 1922; in 1940, over twice as many 
accelerates were in the top group in vocational success. Accelerates 
married earlier, and appeared not handicapped in adjustment or in 
physical or mental health. Special study of the subjects with I. Q.’s of 
170 or above show them “about as successful as lower testing subjects in 
social adjustments,” and they accomplish more. Subjects of Jewish 
descent differed little from the non-Jewish “except in their greater drive 
for vocational success, their somewhat greater tendency toward liberalism 
in political attitudes, their somewhat lower divorce rate.” A vigorous 
chapter on factors in the achievement of gifted men showed the most 
successful distinguished primarily not by intelligence but especially by 
drive to achieve, and by all-round adjustment; outstanding accomplish- 
ment was not associated with marked emotional tensions but rather 
with stability and freedom from excessive frustration. War records 
were good. A careful chapter on the appraisal of achievement empha- 
sized the variety of possible values, the possibility that admirable achieve- 
ment might not involve eminence, and the need for later data if appraisals 
of accomplishment are to be adequate. The final chapter stresses the 
importance of future follow-ups, and over-views the total investigation 
in larger perspectives. 

In total, then, the volume is an outstanding example of that most rare, 
but probably most valuable type of psychological investigation—the 
broadly conceived, long-time developmental study. The subjects were 
that portion of the total population most valuable to society. For all 
who Eas interested in problems of human personality in its finest poten- 
tialities, or the most challenging opportunities in education and guidance, 
the volume should be a “must.” 


i e 
Ohio State University Sidney L. Pressey 


New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should 
be sent to Donald G. Paterson, Editor, Department of Psychology, 
Mj University of Minnesota, Minneapolis 14, Minnesota 


of scapegoating. Revised edition. Gordon W. Allport.. New 
+ Anti-Defamation League of B’Nai B’rith, 1948. Pp. 56. $.20. 
psychology. Karl S. Bernhardt. New York: McGraw-Hill 
Co., Inc., 1948. Pp. 319. $2.50. 
l psychology and its social foundations. Milton L. Blum. New 
: Harper and Brothers, 1949. Pp. 518. $4.50. 
personnel services in general education. Paul J. Brouwer. Wash- 
on, D. C.: American Council on Education, 1949. Pp.317. $3.50. 
ers, men and morale. Wilfred B. D. Brown and Winifred Raphael. 
ndon, Eng.: MacDonald and Evans, 1948. Pp. 163. 10/6. 
ird mental measurements yearbook. Oscar K. Buros, Editor. New 
swick: Rutgers University Press, 1949. Pp. 1047. $12.50. 
adjustment in old age. Ruth Shonle Cavan, Robert J. Havig- 
} t, Ernest W. Burgess, and Herbert Goldhamer. Chicago: Science 
Research Associates, 1949. Pp. 175. $2.95. 
psychology of social classes. Richard Centers. Princeton: Princeton 
niversity Press, 1948. Pp. 432. $5.00. 
chologist unretired. Miriam Allen deFord. Stanford: Stanford Uni- 
ersity Press, 1948. Pp. 127. $3.00. : 
n and education. Godfrey Elliott, Editor. New York: Philosophical 
ibrary, 1948. Pp. 597. ~$7.50. 
energetics of human behavior. G. L. Freeman. Ithaca: Cornell 
University Press, 1948. Pp. 352. $3.50 
place in God’s world. Sol W. Ginsburg. New York: Hebrew 
Jnion College, Jewish Institute of Religion, 1949. Pp. 30. $.50. 
are planning for your career. $. A. Hamrin. Chicago: Science 
Research Associates, 1948. Pp. 200. $2.95. 2 
e character and personality. Robert J. Havighurst and Hilda 
‘aba. New York: John Wiley and Sons, Inc., 1949. Pp. 315. 
00, 
0 to create job enthusiasm. Carl Heyel. 
Book Co., Inc., 1948. Pp. 248. $3.00. 
chology and ethics. Harry L. Hollingworth. New York: Ronald 
‘Press Co., 1949. Pp. 247. $3.50. 
191 


New York: McGraw-Hill 


192 i New Books, Monographs, and Pamphlets 


Applied psychology. Revised edition. Richard Wellington Husband. 
New York: Harper and Brothers, 1949. Pp. 845. $4.50. 

Theory and problems of social psychology. David Krech and Richard 
Crutchfield. New York: McGraw-Hill Book Co., Inc., 1949. Pp. 
639. $4.50. 

Discovering your real interests. G. Frederic Kuder and Blanche B. 
Paulson. Chicago: Science Research Associates, 1949. Pp. 48. 
Single copy, $.75. Fifteen or more copies, $.60. 

Personality projection in the drawing of the human figure. Karen Mac- 
hover. Springfield, Ill.: Charles C. Thomas, Publisher, 1949. Pp. 
181. $3.50. 

Psychological statistics. Quinn McNemar. John Wiley and Sons, Inc., 
1949. Pp. 364. $4.50. 

Workers wanted. E, William Noland and E. Wight Bakke. New York: 
Harper and Brothers, 1949. Pp. 224. $3.00. 

The procurement and training of ground combat troops. Robert Palmer, 
Bell I. Wiley, and William R. Keast. Washington, D. C.: Super- 
intendent of Documents, U. S. Government Printing Office, 1948. 
Pp. 696. $4.50. 

Industrial hygiene and toxicology. Volume I. Frank A. Patty. New 
York: Interscience Publishers, Inc., 1948. Pp. 531. $10.00. 

Machine computation of elementary statistics. Katharine Pease. New 
York: Chartwell House, Inc., 1949. Pp. 238. $2.75. 

Job horizons. Lloyd G. Reynolds and Joseph Shister. New York: 
: Harper and Brothers, 1949. Pp. 102. $2.25. 

‘Human relations in an expanding company. Frederick L. W. Richardson 
and Charles R. Walker. New Haven, Connecticut: Yale Labor and 
Management Center, 1948. Pp. 95. $1.50. 

Company annual reports to stockholders, employees, and the public. Thomas 
H. Sanders. Boston: Division of Research, Harvard Business School, 
1949. Pp. 838. $3.75. 

An outline of social psychology. Muzafer Sherif. New York: Harper 
and Brothers, 1948. Pp. 479. $4.00. 

Government regulation of industrial relations. George W. Taylor. New 
York: Prentice-Hall, Inc., 1948. Pp. 383. $4.00. 

Social class in America. W. Lloyd Warner and Kenneth W. Eells. 
Chicago: Science Research Associates, 1949. Pp. 292. $4.25. 

Constructing classroom examinations. Ellis Weitzman and Walter J. 

<a Chicago: Science Research Associates, 1949. Pp. 140. 

Human behavior and the principle of least effort. George Kingsley Zipf. 

Cambridge: Addison-Wesley Press, Inc., 1949. Pp. 650. $6.50. 


= New Books, Monographs, and Pamphlets 193 


on industrial relations. American Journal of Sociology. 
1949 issue. Chicago: University of Chicago Press, $1.25. 
cting the satisfactions of home economics teachers. AVA Re- 
Bulletin No. 3. Washington, D. C.: Committee on Research 
Publications, American Vocational Association, Inc., 1948. Pp. 
$.75. 

f work and output. Bulletin No. 917. Bureau of Labor Statistics. 
on, D, C.: Superintendent of Documents, U. 8. Government 
Office, 1948. Pp. 160. $.35. 

under the LM EA, relation of wages to productivity. Personnel 
umber 122. New York: American Management Association, 
Pp. 63. $1.25. 

and group relations. Work of Progress Series. New York: 
ean Council on Education, 1948. $1.25. 

house in industry. Chicago: National Metal Trades Association, 
ith Michigan Avenue, 1948. Pp. 27. 

W-CIO looks at time study. Detroit: UAW-CIO Education ~ 
ent, 28 West Warren Street, 1947. Pp. 32. $.50. 
suggestion programs in the iron and steel industry. New York: 
an Iron and Steel Institute, 350 Fifth Avenue, 1948. Pp. 92. 
oning in textile mills: the case for temperature and humidity con- 
provide comfort, health, safety, and optimum production. New 
Research Department, Textile Workers Union of America, 99 
y Place, 1948. Pp. 60. 


1948 DIRECTORY 


AMERICAN PSYCHOLOGICAL ASSOCIATION 


1515 MASSACHUSETTS AVENUE NORTHWEST 
WASHINGTON 5, D.C. 


RE 


This directory gives biographical data and fields of 
interest for the Fellows and Associates of the American 
Psychological Association. Membership lists for the 
Divisions of the Association, the by-laws, a list of past 
officers and meeting places, and a geographical and in- 
stitutional index of members are included. The di- 
rectory is edited by Helen M. Wolfle of the Association 
staff. 


According to present plans of the Association, an- 
other directory including biographical data will not 
be published until 1951. The interim issues will con- 


tain the names of the members, their addresses, and 


their present positions. 


438 pages 


aal of Applied Psychology 


June, 1949 


Alexander Mintz and Milton L. Blum 
College of the City of New York 


generally accepted that certain individuals consistently have 
accidents while others do not. This is commonly known as the 
of accident proneness. A critical examination of the data 
in the literature points to the desirability of reconsidering the 
nce attached to the principle of accident proneness. 
article has two objectives: (1) To indicate that one of the methods 
te the principle of accident proneness is unsound and to 
at its use has led in some instances to exaggerated views of dif- 
in accident proneness; and (2) To propose a method whereby 
tive estimates of differences in accident liability’ may be ob- 
and to point out the conditions when it may be used. 
Statistical evidence for the principle of accident proneness was 
by Greenwood and Woods (6) in 1919. These authors com- 
distribution of accidents in a given population with a simple 
distribution for the same number of accidents in a population of 
me size. Evidence of differences in accident proneness was ob- 
l: It was discovered that more people had no accidents than might 
e been expected “by chance.” Conversely, it was discovered that more 
had many accidents than would have been expected in accordance 
a simple chance distribution. In other words, Greenwood and 
s demonstrated that the obtained distributions of accidents differed 
antly from chance expectancy. Furthermore, they showed that 
of their distributions agreed with theoretically computed distribu- 
ased on the assumption that people differed from each other in their 
d to have accidents. 
ewbold (9) further investigated this problem and pointed out that 
ences in accident liabilities could not be entirely explained simply 
the subsequent discussions we shall use the expression “accident proneness” in 
are personal characteristics of people contributing to the likelihood of their 
accidents, The expression “accident liability” will refer to both personal char- 
ics and stable environmental conditions contributing to accidents records. 
195 


196 Alexander Mintz and Milton L. Blum 


in"terms of different job hazards. In addition, Newbold, in some of her 
work, compared the accident rates for the same people in two successive 
periods and reported that significant correlations existed. 

Both Greenwood and Woods, and Newbold were primarily interested 
in the establishment of the existence of a difference between accident 
records and chance expectancy. In this they were successful and ac- 
cordingly the principle of accident proneness was established. 

However, another method has been used to support the principle of 
accident proneness. A number of investigators and writers of books on 
industrial psychology have pointed out that small percentages of people 
have large percentages of accidents and have presented data accordingly. 
In this method the obtained accident distribution is presented as evidence 
for the principle of accident proneness without a comparison to the dis- 
tribution that would be normally expected “by chance,” i.e., if all indi- 
viduals were equally liable to accidents. This method is fallacious. 


The Method of Percentages 


The method of percentages of people and accidents implies an in- 
correct assumption, viz., that chance expectation requires that all people 
in a population should have the same number of accidents. This is not 
the case. An obvious limitation that has often been overlooked is the 
fact that very often the reported total number of accidents in a popula- 
tion is smaller than the number of people in the population. For example, 
if a group of one hundred factory workers had fifty accidents in one 
year, then a maximum of fifty people could have contributed to the 
accident record and accordingly a maximum of 50% of the population 
would have contributed to 100% of the accidents. Obviously a small 
percentage of the population in this case does not establish the principle of 
accident proneness. However, the number of employees having accidents 
is almost certain to be less than fifty since there is no reason to believe 
that each one should have had only one accident. Such an assumption 
would imply that an accident immunizes its victim against further acci- 
dents. If one makes the assumption of equal liability, the people who 
had one accident should be just as liable to have future accidents as 
those who have not had any. Thus if accident liability is unchanged 
by accidents already had, some people should have two accidents before 
others have had any. In fact, in accordance with chance expectancy 
sa) people should have had three or more accidents before another had 
a single accident. In dealing a deck of cards it is not improbable that a 
person will receive more or less than the three or four cards in a suit 
that seem to be his share. He may get six, seven or more such cards 
without any laws of probability being violated. Similarly, a person 


Re-examination of the Accident Proneness Concept 197 


y have more accidents than seems to be his share in a given population 
out being more accident prone than the average. 
_ Thus the assumption of equal accident liability results in different 
dent totals for the individuals within the group. The resulting 
ibution can be readily derived from the statement, “the current rate 
which accidents occur per person is identical in groups of people with 
ferent numbers of accidents in the past.” It follows directly from 
s statement that as the number of people who have had no accidents 
lecreases, fewer people are likely to have first accidents per unit of time; 
the number of people who have had first accidents increases, the rate 
of occurrence of second accidents increases proportionately. These and 
similar statements can be reformulated as a set of differential equa- 
is, and the solution of this set of equations gives the terms of the 
on distribution. Greenwood and Yule (7) first demonstrated its 
plicability to the accident problem. The Poisson is a discrete dis- 
bution rather than a continuous one. As applied to the accident 
blem, its consecutive terms give the predicted numbers of people 
who had no accidents, one accident, two accidents, ete. The terms are 
ei 2 3 
Nem, Ne-™m, N-e” a Ne” = 
e is the constant 2.71828---, m is the mean number of accidents per 
person. 
_ A number of sets of data will now be discussed in order to illustrate 
the inadequacy of the method of percentages of people and accidents. 
Based upon original records obtained by the authors from a foundry 
‘it was found that 1.8% of the 280 men in the day shift had 11.4% of the 
“accidents; 10% of the men had 44.3% of the accidents. In the night 
shift 5.8% of the 120 men had 12.5% of the accidents and 37.5% of the 
men had all of the accidents. A computation of the distribution of 
accidents in accordance with chance expectancy (equal liability distribu- 
on) indicated that the differences between the obtained and expected 
distributions were not significant. In accordance with the theoretical 
distribution, 1.4% of the people should have had 8.3% of the accidents 
“and 8.9% of the people should have had 38.8% of the accidents. These 
‘Percentages obtained from a theoretically computed equal liability dis- 
tribution show that the accident distribution actually obtained is in ac- 
cordance with chance expectancy and does not establish the existence of 
4 Accident proneness. 
f A study that is often referred to in discussions of accident proneness 
18 that of the National Association of Taxicab Owners and the Metro- 
itan Life Insurance Company (11). These data deal with the records 
Ë 1294 drivers employed by several taxicab companies. Viteles (13) 


etc., where N is the number of people, 


198 Alexander Mintz and Milton L. Blum 


states that “the incidence of accident proneness in the operation of motor 
vehicles has been well demonstrated in this study.” “It is interesting to 
note that the data obtained in accident prone studies in other types of 
industries if plotted would closely conform to the curve shown. . . .” 

Neither the authors of the report nor the author of the textbook 
compared the data with the simple chance distribution. Such computa- 
tions have been made and are presented in Figure 1. 


z 

wW 

o 

9 

u 

i) 

z 

a ol 

æ 

w 

a 

w OBTAINED 
>2 

= — — — POISSON 
4 

= 

2 

= 

2 

(5) 


60 80 100 


20 40 
CUMULATIVE PERCENT OF DRIVERS 


Fic. 1. Relationship between cumulative percentage of taxi drivers and of accidents. 

The solid line in the figure represents the cumulative percentages 
of accidents corresponding to cumulative percentages of drivers, based 
on the data as quoted in the original report. The dotted line represents 
the corresponding cumulative percentages from an equal liability dis- 
tribution. 

The two lines are obviously very similar in shape. The argument (13) 
could be repeated verbatim with percentages from the chance distribution 


i _Re-exzamination of the Accident Proneness Concept 199 
for obtained percentages, with very little loss in apparent 


d no accidents instead of the obtained 25.2%. The best and the 
60% would have had 18.3% and 81.7% of the accidents respec- 
instead of the actually obtained 17.2% and 82.8%. The worst 
of the drivers would have had 63.9% of the accidents (instead of 
the worst 10% would have had 24.7% instead of 31.9% of the 


of the fact that the two distributions are very similar in shape, 
f ce between them is statistically significant, the chi square 
mg 122.77 (d.f. = 6, P < .0001). In other words, factors other than 
chance factors are definitely present but do not markedly change 
shape of the chance distribution. 

ther often referred to study on accident proneness is the one re- 
by Slocombe and Brakeman (12). Their data are based upon 
records of 2300 men employed by the Boston Elevated Railway 


any. 
| discussing their data as indicative of differences in accident prone- 
ocombe and Brakeman classified the men with four or less acci- 
“low accident men” and those with five or more accidents as 
accident men.” This arbitrary division placed 1828 men in the 
Category and 472 men in the latter division. The “low accident” 
y r 2.1 accidents while the “high accident” men averaged 7 
nts. Slocombe and. Brakeman did not compute the chance ex- 
ncy of the number of men having four accidents or less. Actually, 
ple chance distribution, 1824 men should be expected to be in 
tegory and so only four more men of the total 2300 are in the “low 
nt” group than obtained by chance. According to chance ex- 
fancy, the “low accident” and “high accident” men should have 
d 2.4 accidents per man and 5.8 accidents per man respectively. 
lifference is not much smaller than the one actually obtained. This 
not mean that there is no evidence for differences in accident 
ss in the data, It merely means that Slocombe and Brakeman’s 
argumentation is inconclusive. ? 
ore recent data based upon a random sample of licensed drivers 
‘state of Connecticut (2) have been analyzed by Cobb (1). He 
the amoùnt by which the variance of accident records exceeds 
ance of the Poisson distribution and thus determined that these 
nts records cannot correlate with a perfect test of accident prone- 
to a degree higher than +.44. 
iva (2) refers to these data and uses as argument for the principle 
ident proneness mainly the fact that four per cent of the drivers 


200 Alexander Mintz and Milton L. Blum 


were responsible for 36% of the accidents. In a simple chance distribu- 
tion 2.4% of the drivers would be responsible for 21.2% of the accidents. 
Again a comparison of percentages of people and of accidents in incon- 
clusive. The figures just quoted based on the assumption of a simple 
chance distribution look almost as impressive as the figures in the actually 
obtained distribution.* 


Quantitative Estimate of Differences in Accident Liability 


It is possible to arrive at an estimate of the magnitude of differences 
in accident liability (as distinguished from differences in accident records) 
in the case of many populations. The procedure has been previously 
used by Cobb (1) as a step in estimating the maximum correlation between 
accident records and psychological tests. This procedure can be used in 
many instances to estimate the magnitude of differences in accident 
liability, but it is also necessary to mention that this procedure is not 
universally applicable. 

The presence of differences in accident liability of individuals in a 
population results in a composite of Poisson distributions of the accident 
records. The reason for this is as follows: Each particular degree of 
accident liability present in a population should result in a Poisson dis- 
tribution of the accident records. When two or more degrees of accident 
liability are present the resulting distribution is the sum of the two or 
more corresponding Poisson distributions. If the distribution of accident 
liability is a continuous function the resulting probability function of 
accidents is a composite of Poisson distributions which can be deter- 
mined by integration. 

When a given distribution of accident records is found to conform 
closely to a composite of Poisson distributions the evidence is consistent 
with the assumption that the differences between the accident records 
of different people are due partly to differences in their accident liability 
and partly to “chance” factors not predictable in terms of knowledge of 
the people or of their accident records. In this assumption, the “chance” 
factors produce the variability within the constituent Poisson distribu- 
tions while the differences in accident liability are responsible for the 
differences between their means. In accordance with such an assump- 
tion, one may analyze the obtained variance of a set of accident records 

2 Tables 1 (Foundry Data), 2 (Taxicab), 3 (Street Car Drivers), 4 (Auto Drivers), 
7 (Newbold’s Data), and 9 (Conn. car drivers) have been deposited with the American 
Documentation Institute to reduce printing costs. For these six pages of tables order 
Document 2633 from American Documentation Institute, 1719 N Street, N.W., Wash- 
ington 6, D. C., remitting $0.50 for microfilm (images 1 inch high on standard 35 mm- 
motion picture film) or $0.60 for photocopies (6 x 8 inches) readable without optical aid. 


Re-examination of the Accident Proneness Concept 201 


two constituent variances and view one of them as representing the 
operation of the “chance” factors, the other as characterizing the differ- 
‘ences in accident liability. The former is the weighted arithmetic 
“average of the variances of the Poisson distributions. As Cobb has 
shown, its value can be readily estimated as equal to the mean number 
of accidents per person.’ Thus the residual variance representing the 
operation of differences in accident liability may be estimated if one sub- 
tracts the mean number of accidents per person from the obtained 
ariance of accident records. We have performed this computation for 
a considerable number of accident distributions and have expressed the 
resulting variances attributable to unequal accident liabilities as per- 
centages of the corresponding total variances of accident records. 
_ The argument of the last paragraph pre-supposes that the obtained 
ent distribution approximates a composite Poisson distribution. 
etically, an infinite variety of such distributions could be computed, 
ding on the assumed form of the distribution of the means of the 
oisson distributions. Actually only one kind of such composites seems to 
have been used in research, viz., Greenwood and Yule’s (7) “unequal 
liability distribution” (“UD”). This distribution is based on the as- 
" sumption that accident liability of people is distributed along a Pearson 
Type ITI curve, a continuous skewed unimodal curve. Its equation may 
be found in several sources, e.g. (3), (8). Many sets of accident data can 
be actually approximated by composite Poisson distributions based on 
such assumed distributions of accident liability. It should be noted 
however, that Greenwood and Yule’s “UD” distribution is by no means 
“the only possible unequal liability (composite Poisson) distribution. 
t Greenwood and Yule (7) report a set of equations for a different type 
of composite Poisson distribution, based on the assumption that accident 
2 liability is normally distributed. This distribution does not seem to have 
been used in research. The possibility should not be overlooked that 
this distribution or still another composite Poisson distribution, based 
on some other assumed distribution of accident liability, might prove 
to be useful in research. In this paper, composite Poisson distributions 
based on the Pearson III curve were used most of the time. In a few 
Mstances another possibility was explored to some extent; some sets of 
data suggested discontinuous distributions of accident liability, the dis- 
“continuity being due to the presence of small numbers of deviant indi- 
viduals. On the other hand, the presented analysis of the sample 
Variance into two components is not legitimate if the obtained distribu- 
tion deviates significantly from any composite Poisson distribution. 
. "This follows from the fact that in a simple Poisson distribution the variance is 
_ Always equal to the mean. Hence, in a composite of such distributions, the mean of 
variances is equal to the mean of the means. 


202 Alexander Mintz and Milton L. Blum 


The line of reasoning just developed will now be applied to the more 
widely known studies of accident proneness. 

The Greenwood and Woods study (6) presents fourteen sets of data. 
The majority of their findings agree rather well with the composite 
Poisson distributions computed according to Greenwood and Yule (7). 
In other words, the obtained figures are in accord with the assumptions: 


1. Accident proneness varies from person to person and its distri- 
bution is represented by a unimodal continuous skewed curve known 
as Pearson type III. 

2. Accident proneness of a person is unaltered by accidents he 
may have. 


Twelve of the fourteen sets of data do not differ significantly from 
the corresponding theoretically computed figures. The P’s reported by 
Greenwood and Woods obtained from the chi square technique range 
from 0.15 to 0.93.4 The sum of the chi squares for these 12 sets of data, 
based upon our computations, is 35.33, which for 30 degrees of freedom 
results in a P equal to about .23. The two deviant distributions will be 
discussed later. 

‘Thus it is possible to approximate closely the majority of Greenwood 
and Woods’ tables by theoretically computed distributions based on the 
assumptions that accident proneness is constant for each person and 
distributed in different people in accordance with a Pearson III curve. 
This finding is one of the principal ones in favor of the existence of dif- 
ferences in accident proneness. 

How large then are these differences in accident proneness if we take 
the findings at their face value and assume that variations in “chance” 
and differences in accident proneness are the only factors accounting for 
these distributions of accident records. Table 5 presents the data per- 
art to the relative size of these differences in the Greenwood and woods 
study. ` 

For each one of Greenwood and Woods’ tables the estimated per- 
centage of the variance of accident records attributable to differences in 
accident liability is given. As stated on a preceding page, the estimated 
variance of accident liability is the difference between the obtained 
variance of accident records and the mean number of accidents. Di- 
viding this difference by the variance of accident records we obtain the 
percentage of the variance attributable to differences in accident lia- 
bility. In addition, the following data are also given: the number of 
cases, the mean and the variance of accident records. 

‘The computations do not appear to be accurate in all cases. It is to be noted that 


the paper appeared in 1919 prior to Fisher's pointing out the procedure for determining 
d of Ray, pointing out the pı lure for dei 


Table 5 


ges of Variance Attributable to Differences in Accident Liability, 
from Greenwood and Woods Original Data 


Number Obtained 
of Variance m= my 100 
Cases Mean (m) (m) my 
750 0.576 0.540 — 
580 0.478 0.491 = 
647 0.465 0.691 32.7% 
584 0.433 0.521 16.9% 
100 3.040 6.938 56.2% 
414 0.483 1.008 52.1% 
201 0.473 0.508 7.0% 
198 1.318 1.873 29.6% 
59 0.983 1.203 19.3% 
136 0.794 0.928 14.4% 
50 2.800 6.720 58.3% 
50 1,920 3.313 42.1% 
55 2.473 3.704 33.2% 
61 0.705 0.897 21.4% 


edian percentage of the total variance, attributable to differ- 
n accident liability is 31.15. The percentages range from 7% to 
In nine of the twelve cases the percentage is less than 50. 
lese figures hardly correspond to the impressions one is likely to derive 
ook accounts. The share of differences in accident liability 


of the cases while the rest of the variance which is more than 

large must be attributed to unpredicatble “chance” factors. 

bold (9) collected a large number of sets of data from a number 

es. The factories were chosen on the basis of uniformity of 

k performed, completeness of accident recording and opportunities 

y minor accidents. The large majority of the accidents were 

nature, the author stating that the serious injuries were too - 

' correlational work. The findings differ in some respects from 

of Greenwood and Woods. 

variety of results can be found in Newbold’s material. Never- 
U 

in general the ratio "+ ™ 5 100 tends to be considerably 

2 

than in the data of Greenwood and Woods. It also tends to be 

in the other studies we have examined. This difference be- 

‘Newbold’s data and those of the other investigators is due in 

the fact that the mean numbers of accidents per person are rather 

npared to those of most of the other distributions. The irregu- 


204 7 Alexander Mintz and Milton L. Blum 


larly variable factors should become relatively less and less important in 
the long run. Still, this is not the whole explanation. The ratios com- 
puted for Newbold’s material remain large even when compared to ratios 
from distributions with similar means. Table 6 presents these ratios 
as computed from the statistics given in Newbold’s paper; the number 
of cases and mean numbers of accidents as given by Newbold and the 
variances (squares of Newbold’s standard deviations) are also given. 
The figures may be compared to the corresponding ones in Table 5. 

The median percentages are 71.6 and 56.05 for the men and women 
respectively. The range is very great, the largest figure being 90% 
while at the other extreme there is an obtained variance which is actually 
slightly smaller than that of the corresponding Poisson distribution; 
this distribution closely approximates a simple chance distribution. 
These percentages do not accurately represent the share of differences in 
accident liability in the variance of accident records in all cases. In- 
spection of Newbold’s curves suggests that many of the obtained accident 
distributions deviate significantly from composite Poisson distributions. 
This matter was only partially investigated. The amount of work 
involved in the computation of composite Poisson distributions for thirty 
nine sets of data would have been prohibitive, particularly because these 
data are given by Newbold in the form of graphs rather than tables. 
Many of these graphs appear to have been inaccurately drawn, inasmuch 
as there are discrepancies between the totals of workers and accidents 
as read off from the graphs and as given in Newbold’s Table. 

Nevertheless, it can be shown that in some of Newbold’s sets of data 
composite Poisson distributions are appropriate and the percentage of 
the variance attributable to differences in accident liability is large. As 
an example, Table 7° presents the data from Newbold’s graph ATII, to- 
gether with the corresponding composite Poisson figures. The closeness 
of the fit is apparent. The accident liability share is 75.8%. 

Some of Newbold’s sets of data suggest that the distribution of 
accident liability was a discontinuous one; in these sets of data the great 
bulk of the cases fit either a simple or a composite Poisson distribution, 
but there are also a few deviant cases which lie outside of such distribu- 
tions. Most of the obtained variance of accident records due to accident 
liability may be due to the presence of these deviant cases; in other words, 
large deviations from the average accident liability appear only in a very 
small minority of cases. Thus Newbold’s set EII is essentially a dis- 
tribution of the simple Poisson type, plus one markedly deviant worker. 
Set EV may be viewed as a distribution of the composite Poisson type 
(excess variance = 41%) plus 9 deviant workers. Table 8 presents 
these data. 


5 See footnote 2. 


ion of the Accident Proneness Concept 205 


Table 6 
` Analysis of Newbold’s Data 
Number Mean Variance 
of Cases (m) (m) or 
226 18° 59 68.0 
22 27 20 ay 
256 AL 69 40.5 
81 43 AB 42° 
106 48 59 19.1 
281 51 94 43,8 
299 57 81 29.6 
190 68 1.72 
50 1,04 1,99 
47 1.47 3.76 
82 1.61 5.66 
148 1.81 5.11 
218 1.95 7.13 
181 2.50 6.60 
304 2.56 10.56 
93 2.66 12.53 
77 2.73 18.84 
284 2.90 23.33 
440 3.64 13.76 
352 3.78 17.14 
301 3.94 14.90 
376 3.98 14.06 
92 4.07 18.15 
57 5.60 56.25 
204 6.44 41.86 
380 387 53 
50 52 1.04 
120 63 1.64 
110 65 1.46 
161 70 1.21 
346 79 1.35 
142 1.06 1,77 
145 1.06 2.04 
125 1.34 3.24 
98 1.39 3.39 
100 2.12 5.57 
161 2.30 8.58 
58 2.43 7.88 
28 5,43 15.52 


rences between Newbold’s findings and those of Greenwood 
, and of other investigators whose material is examined in this 
lay possibly be attributed to the fact that her material consisted 
entirely of minor accidents. In spite of Newbold’s statement, 


206 Alexander Mintz and Milton L. Blum 


the reporting of accidents may not have been complete. It is difficult 
to ascertain the degree of completeness with which minor accidents were 
reported and there may have been individual differences in the reporting 
of accidents, producing the illusion of large differences in accident lia- 
bility. On the other hand, constant personal characteristics may play 
a more direct role in the causation of minor accidents than in that of 


Table 8 
Comparison of Two of Newbold’s Sets of Data with Theoretically 
Computed Distributions 
Set EIII* Set EV 
ý bea Unequal Liability 
Accidents ual Liability Distribution 
pes! istribution j (Composite Poisson) 
Actual (omitting 1 case) Obtained (omitting 9 cases) 

0 201 197 24 22 

1 21 26 22 19 

2 2 2 8 12 

3 1 0 5 7 

4 0 0 6 4 

5 0 0 1 2 

6 0 0 1 1 

7 0 0 1 0 

8 0 0 3 0 

9 0 0 2 0 

10 1 0 1 0 

ll 0 0 0 0 

12 0 0 0 0 

13 0 0 0 0 

14 0 0 0 0 

15 0 0 1 0 

16 0 0 1 0 
Total 226 225. 76 67 


* The discrepancy between the “actual” and the “equal liability” accident totals is 


out a Sacer payne Aisa ee eel as given in a table in Newbold’s paper, 


o raina Psychonanalysts generally believe that many accidents 
are unconscious self-injuries. It is possible that such unconscious self- 
injuries usually result in minor damage, just as in hysteria, in which minor 
paute are common while major injuries are unusual. Minor acci- 
dents in industry may be often due to psychological mechanisms of the 
hysterical type.® 


‘This hypothesis was suggested to the writers by E. Emmons. 


Re-examination of the Accident Proneness Concept 207 
tribution of the Connecticut licensed car drivers is essentially 
e Poisson distribution. A Greenwood and Yule “unequal 
istribution fits the data rather well, except at the upper end. 
an be accounted for if one assumes that the distribution of 
bility deviates slightly from a Pearson III curve. The 
ortion of the variance of accident records attributable to 
n accident liability is 21.2%. Table 9 presents these data.’ 
is corroborative information from other sources, indicating that 
es in accident liability often account for only a relatively small 

‘the variance of accident records. The correlations between 
cords in different periods of time reported by Newbold (9) 
cently by Ghiselli and Brown (5) are in most instances not 


a median of +.42; we omit the intercorrelations between different 
idents presented in both papers which are considerably lower. 
ations justify inferences which are similar to those we arrived 
use of a different method. 


tes for people with different accident records are nonexistent. 
ice is in conformity with our findings. The usual textbook 
ions of accident proneness would suggest very different insurance 
different accident records. 
no composite Poisson distribution conforms to a set of accident 
the Suggested procedure is not applicable. The existence of factors 
be assumed, which alter the shapes of the constituent Poisson 
Changes in accident liability of people as a function of previous 
encountered suggest a possible explanation of such results. We 
‘tempt to verify this possibility inasmuch as there seemed to be 
arriving at a reasonably plausible hypothesis about the course 
ges in terms of information available at present. The only 
suggested so far in the literature seems to have been the one 
Greenwood and Yule’s “Biassed distribution,” and it is un- 
theoretically and therefore unsuitable for research. This distri- 
simply a Poisson distribution with a different first term. If 
no initial differences in accident liability, but the first accident 
the accident liability of its participants which would subse- 
main constant, the resulting distribution would not be an in- 
e Poisson distribution, because the one accident class would not 
n the “simple chance” case. An incomplete Poisson distribution 
be roduced only by continuing changes in accident liability with 


208 Alexander Mintz and Milton L. Blum 


successive accidents, and it would be a strange coincidence if these changes 
should be so graded as to produce a tail end of a Poisson distribution 
which has a completely different derivation. 

The distributions which deviate significantly from any composite 
Poisson distributions are two of Greenwood and Woods’ distributions 
(their Table 1A and 1B), the distributions of taxicab accidents and the 
distribution of street car accidents. Inspection of the data indicates 
that the obtained distributions are more leptokurtic than Poisson distri- 
butions, and compounding several of the latter can only flatten out the 
resulting shape. Several of Newbold’s distributions may be in the same 
category; they were not examined in detail for reasons stated earlier. 
The share of differences in accident liability in the total variance cannot 
be determined in such cases. The existence of other factors than differ- 
ences in accident liability and unpredictable “chance” factors must be 
assumed. 

Discussion 


It must be remembered that not all differences in accident liability 
are differences in accident proneness viewed as an individual character- 
istic. This point is not a new one; it has been made among others by 
Newbold and by Cobb. It is disregarded by investigators who combine 
data about street car accidents or taxi accidents from different cities. In 
factory work, different jobs differ in conditions of safety. In automobile 
or other vehicle driving, the safety conditions are not necessarily the 
the same from route to route, in city compared with city. The amount of 
mileage driven, necessary driving in adverse weather, etc., must contri- 
bute more opportunities for accidents and these are not functions of 
accident proneness defined as an individual trait. For example, only 
21.2% of the variance of the accident records of the Connecticut drivers 
was due to differences in constant accident rates. When one considers 
the hazards of driving just mentioned, it seems logical to state that there 
is not much room for differences in accident proneness as a psychological 
characteristic, insofar as these data are concerned. 

We have pointed out that in many instances the portion of the vari- 
ance of accident records attributable to differences in all forms of accident 
liability is relatively small as compared to the residual variance attribu- 
table to the operation of factors which are not predictable in terms of 
either the constant characteristics of people or of their previous accident 
records. These unpredictable or “chance” factors when operating alone 
give a so-called simple chance or equal liability or Poisson distribution. 
The expression “chance factors” should not be misunderstood. They 
are not necessarily unpredictable in terms of changing features of the life 
situation. Thus a well known psychoanalyst spoke to one of the writers 
about a man he knew who had a temporary period of accident proneness 


_ Reexamination of the Accident Proneness Concept 209 


of marital trouble, during which time he had several near- 
rapid succession. ‘Chance” refers only to lack of predic- 
ty in terms of constant characteristics of the individual. 

are many kinds of such “chance” factors. One kind does not 


is clearly at fault in causing an accident, the accident might 
re occurred if the circumstances had been different. One of the 
as once in a car driven by a man who did badly enough to have 
a very serious accident: the driver became frightened by a wasp 
lis leg and stopped looking at the road; shortly afterwards the car 
lled into a ditch at the bottom of an embankment to the left of the 
iy. There had been no cars in the other traffic lane at the moment 
sed it, the embankment was not steep and there was no accident. 
t half a mile further there was a steep drop into a river on the left 
‘The expression “luck” seems to be quite appropriate here. 
| Cobb pointed out, the correlation between accident records and 
ect test of accident proneness need not be high. One cannot use 
arbitrary criterion for classifying people as excessively accident- 
. For example, Poffenberger (10) states that “accident prone 
s are those who have two or three times as many accidents as the 
driver . . . the term need not be restricted to auto accidents 
for it covers equally well accident repeaters in industry.” In 
distributions examined here the number of accidents per person 
IS One-half an accident or less. According to Poffenberger then, this 
Would mean that persons with one or more accidents are to be considered 
cident prone. This is obviously unfair. It is legitimate to select 
dy those people who have more than the average number of acci- 
but they should not -be automatically classified as excessively 
cident prone without further evidence. Actually within a simple 
ce distribution some people are likely to have two to three times as 
ecidents as the average person. One can verify this by referring 
Poisson distributions in our tables. In most published distribu- 
only a very small minority have accident records which lie com- 
ly above the point at which the Poisson distribution gives negligible 
_ As one approaches this point, one finds additional cases of more 
n average accident proneness, but some people with only average 
t proneness who have had bad luck or temporary difficulties are 
included in the group of people who have had many accidents. The 
‘of the exact estimation of the relative number of accident-prone 
iduals and bad luck individuals in any particular group of accident 
dsis complicated. One should not attempt to make rough estimates 


Soh ta comparison of obtained frequencies with the corresponding 


on frequencies. 


210 Alexander Mintz and Milton L. Blum 


Summary 


1. A commonly used method of comparing percentages of men and of 
accidents proves nothing about the existence of differences in accident 
proneness. Examples proving the inconclusive nature of the method are 
cited. 

2. Comparison of obtained accident distributions with simple chance 
(Poisson) distributions establishes that there are differences in accident 
liability but does not indicate whether these differences are large or small 
and does not exclude the simultaneous operation of unpredictable 
“chance” factors. 

3. Different accident records do not necessarily represent different 
degrees of accident liability. A method for analysis of the variances 
of accident records of people into two component variances is suggested, 
one component attributable to differences in accident liability, the other 
to unpredictable ‘chance factors.” It is pointed out that the method is 


only applicable when the obtained distribution resembles a composite , 


of Poisson distributions. 

4. A number of published distributions of accidents are examined 
by the use of the above method. The variance attributable to differences 
in accident liability varies considerably. 


In the distributions which are examined in this paper and which do 
not involve primarily minor accidents, the variance attributable to 
differences in accident liability is in most cases between twenty and forty 
per cent of the total variance of accident records. Although differences 
in accident liability should not be overlooked as a factor in the different 
accident records of people, the effect of this factor is rather small as com- 
pared to the residual 60 to 80 per cent attributable to unpredictable 
factors. It is therefore apparent that in many instances personal accident 
proneness, which is but one of the components of accident liability, has 
been an overemphasized factor. 

Received November 2, 1948. 


References 


1. Cobb, P. W. ‘The limit of usefulness of accident rate as a measure of accident 
proneness. J. appl. Psychol., 1940, 24, 154-159. 
2. ne H.R. Why we have automobile accidents. New York: John Wiley & Sons, 
3 Eerten W.P. Frequency curves and correlation. London: Layton, 1906. 
4 ra - A. Statistical methods for research workers. London: Oliver & Boyd, 
5. Ghiselli, E. E., and Brown, C. W. Accident proneness among street car motormen 
and motor coach operators. J. appl. Psychol., 1948, 32, 20-23. 


amination of the Accident Proneness Concept 0 


M., and Woods, H. M. The incidence of industrial accidents upon 
with specific reference to multiple accidents. Industr. Fatigue Res. 

. 4, 1919. 

., and Yule, C. V. An enquiry into the nature of frequency distri- 

esentative of multiple happenings, with particular reference to the 

; of multiple attacks of disease or of repeated accidents. J. Roy. 

, 1920, 83, p. 255-279. 

[.G. The advanced theory of statistics. Philadelphia: J. B. Lippincott, 


M. A contribution to the study of the human factor in the causation 
ts. Ind. Fatigue Res. Bd., Rpt. 34, 1926. 

„ A. T. Principles of applied psychology. New York: D. Appleton- 
Co., 1942. 

eastoab accidents. Metropolitan Life Insurance Company, New York, 


Psychol., 1930, 26, 29-38. 
S. Industrial psychology. New York: W. W. Norton & Co., 1932. 


Method of Paired Comparisons and a Specification Scoring 
Key in the Evaluation of Jobs y 


G. A. Satter 
University of Michigan 


Within recent years, public and industrial employers have increasingly 
attempted to place their wage structures on objective bases. Among the 
techniques employed to this end are those which are commonly referred 
to as “job evaluation methods.” Collectively, these methods represent 
attempts to rate jobs in order to determine their relative worth with 
respect to other jobs and to use the job’s standing, within the group 
of which it is a member, as a basis for assigning a dollars-and-cents 
value to it. i 

The most widely used methods fall into four general classes: (a) Those 
in which the operation of evaluation is one of comparing job against job 
in terms of the job’s overall worth (Ranking Method); (b) in which it is 
one of comparing job against job in terms of specific “elements” or traits 
(Factor Comparison Method); (c) in which it is one of comparing the job 
against an arbitrarily defined scale of overall worth (Classification 
Method); and (d) in which it is one of comparing job against arbitrarily 
defined scales covering individual job traits or “elements” (Point Evalua- 
tion Method). 

From time to time, various authors have described alternatives to, OT 
modifications of, the above basic methods but for the most part these 
methods have retained their popularity with surprisingly few modifica- 
tions. Thus, Viteles (10) and, more recently, Otis and Leukart (7) 
have recommended that the Method of Paired Comparisons be used aS 
an alternate to the Ranking Method. So far as the present writer knows, 
no organization has ever given this recommendation a trial. Similarily, 
there are other scaling methods which might profitably be applied to the 
problem of jobs; on its face, the problem of scaling jobs does not seem to 
be pronouncedly different from that of scaling other subject matters. 
These alternative methods, too, have been neglected. 

The present report describes the results of applying two psychometric 
techniques to the problem of building job scales in two industrial plants. 

_ * The writer expresses appreciation to Mr. A. J. Miller, Assistant Director of Indus, 
trial Relations, The Mead Corporation, for his advice and support on these projects- 
and to Hugh Black, C. Alvin Hoffman, and Robert Rock who assumed major responsi- 
bility for the collection and analysis of the data presented here. 


212 


Method of Paired Comparisons 213 


edure involves the application of the Method of Paired Com- 
‘and the other, the development of scoring keys which can be 
j o job specifications. Both procedures are oriented toward de- 

oping a scale the points of which are defined by jobs and which can 


d ‘in making the kinds of job measurements which are helpful in 


g up wage schedules. 


I. Construction and Characteristics of Job Scales 
Built by the Method of Paired Comparisons 


aratory to the scaling project, 
The study methods were 
United States Employment 
king down the organiza- 
tional level of the jobs 

oint opinions of the 
ponsible for the 
ll, and the job 
and organizing 


d 
In both plants the judgments called for 
made by those persons within the 
to know the jobs in question best, 
isors immediately re- 


ents. 
dures to be used were 
ials which 


by their respecti 
objectives of the projec 

d, and the members of the c 
ere to use in arriving at and reportin 


derably more detail than one conventionally 


descriptions prepared for a job evaluation project. The objective in each 


provide the reader, even though he had little previous contact with the job, 
i d knowledges which it required, 


Jonsibilities which it entailed, and the conditions under which it was typically 
haracteristics which are 


-in short, to arrive at judgments concerning those ¢ 
associated with job worth. 


job descriptions contained consi 


214 G. A. Satter 


consisted of a bound volume of the job descriptions, which had been prepared 
earlier, and a set of forms on which their judgments were to be recorded. It 
was then possible for the members of the committee to proceed independently, 
and at their leisure, to make their judgments. It might be pointed out here, 
that the routines of the Method of Paired Comparisons are particularly well 
adapted for use in the industrial situation. Since comparatively naive judges 
can be introduced to the task called for by the method with a minimum of 
training, it is possible to work with large numbers of judges who can proceed 
independently under a minimum of supervision. 

By following this general procedure, the jobs in the two plants were scaled 
independently on four traits or “elements” which a preliminary review of the 
literature of job evaluation indicated as being potentially most useful in dis- 
criminating between clerical jobs. For purposes of the evaluation, these traits 
were defined in the following manner: 


(a) Educational Skills. The degree to which the job demands preparatory 
skills (verbal, quantitative, etc.) which are most generally acquired in the 
schoolroom. 

(b) Work Skills. The degree to which the job demands specialized skills 
which can en be acquired either through job training or by extended experi- 
ence on the job. 

(c) [oh appoga Skills. The degree to which the job makes special demands 
on the individual worker; the degree to which the job is unpleasant, tiresome, 
monotonous, dirty, ete. 

(d) Social and Personal Skills. The degree to which the job requires 
pe pene aeons skills—skill in supervising and in coordinating the activities 
of others. 


Thus, both groups of judges were required to make their inter-job com- 
Parison in four frames of reference. If the Method of Paired Comparisons 
been used in its traditional form, this would have meant that each judge 


in Plant A would have had to make an) 4 judgments (9,660) and 


those in Plant B, 2,112. To reduce the number of pairs of jobs in Plant A to 
a more feasible number, a suggestion which Uhrbrock and Richardson (9) 
made earlier was followed. By using key jobs, against which all comparisons 
were made, and groups of ten jobs in which only the in-group comparisons 
were made, the total number of judgments made by each judge was reduced 
from 9,660 to 3,660. These job groups were set up in the following manner. 
The investigating staff, selected from the group of 70 jobs, ten which in their 
ei seemed to fulfill the dual criterion of being generally well known and 
which collectively represented the entire range of abilities required by the 
Seventy. These constituted the “key group.”? The sixty remaining jobs, 
then, were assigned to groups of ten in a random fashion. In preparing the 
worksheets for the judges, a scrambled order of pairs was used; each job title 
was presented first in half of the pairs; and the pairs involving the key jobs 
were interlaced throughout the’whole list. No judge was informed that the 
key job device was being employed. In Plant B, the judges’ worksheets called 
for the complete set of 2,112 Judgments. Apparently, as we shall see later 
when the results from the two plants are compared, the modified procedure 
employed in Plant A did not distort the final results. Making and recording 
the judgments required from six to ten hours of the judge’s time. 


? The key jobs were: Dark Room Technici i i Pay- 

ian, Junior Stenographer, Mail Boy, Pay 
roll Clerk, Record Clerk, Scheduling Clerk, Secretarial Assistant, Statistical Supervisor, 
Stenographer, and Telephone Operator, 


Ez 


Method of Paired Comparisons 215 


tion of the Scale Values. The data from each group of judges were 
a following a “shortcut” procedure recommended by Guilford 
e value equivalents of each job were computed. This procedure 
d in preference to that called for by Thurstone’s Case V of the 

parative Judgment for the following reasons: (a) The small number 
used hardly warranted the laborious operation of computing the 
l mates of each scale separation which is required by the Thurstone 

re and (b) Guilford (2) has demonstrated empirically the compara- 
ile values derived from using his abbreviated procedure and those 
Case V procedure. 


Results 


sults of the operations described above were four skill scales 
presumed to be capable of measuring the dimensions on which 
clerical jobs are commonly paid. At this point, the problem of 
the measurements yielded by these scales arises. If a suitable 
available, multiple correlational procedures are probably most 
ate. Ina certain sense, the validation of job scales presents an 
e difficult problem than is typically encountered in the validation 
selection instruments; here, the problem is not only one of 
g the criterion, but, in the first place, of defining one. Lacking 
ble standards, in the typical wage evaluation project, job 
nts are evaluated in terms of how well they reproduce the 
g Wage structure in the plant or the wage structures of other 
plants in the area. Both procedures obviously have serious 


project described here, wage survey data for similar jobs 
e plant were assembled with the expectation that these data 
employed as a “criterion.” Preliminary tabulations made it 
vious that these data were incapable of generating correlation 
ing, even themselves; the differences in wages paid for what 
med to be similar jobs were often times as large, or even 
n those which existed between different jobs. Accordingly, in 
es, the plants’ prevailing rates were used as criteria in com- 
scales values of the four skill scales. 
multiple regression equation was written for predicting rates from 
lues. The multiple R’s resulting from the application of the 
n equation were .77 and .83 in Plants A and B respectively. 
plants, the Work Skills Scale contributed the most toward ac- 
for the total variance of “going rates.” Apparently then, the 
Measurements made by paired comparisons can yield measure- 
ich are capable of ordering jobs with respect to their worth. 
ts reproduced in Table 1 also reveal that even better scales 
developed; the skill scales obviously do not measure independent 


216 G. A. Satter 


dimensions. This would suggest: (a) that the original choice of the 
traits was a poor one; (b) that the traits were poorly defined; and/or (c) 
that the judges were not highly proficient in making the kinds of dis- 
criminations which this project called for. 


Table 1 


Intercorrelations of the Scale Values Derived for Each of Four Job Traits 
and Their Correlations with Rates 


Plant A* Plant B* 
Trait 2 3 4 5 2 3 4 5 
1. Educational Skills 98 —.49 733 71 89 —.37 74 %3 
2. Work Skills —.39 75 71 —.40 .82 82 
3. Application Skills — 34 —.14 —.57 —42 
4. Social and Per. Skills 66 70 


5. Going Rates 


* None of the inter-plant differences in the z-equivalents of the r’s attain statistical 
significance. 

Other characteristics of the scale values derived here may be pointed 
out. For one, the analyses suggest that these values are in general inde- 
pendent of the particular population of jobs chosen, i.e., that they have 
general validity. The correlations between the scale values of jobs in 
Plant A and those for jobs in Plant B, which the job analysis data indi- 
cated as similar in content, are presented in Table 2. These findings 
should be of special interest since they suggest that ‘standard scales” are 
feasible—that scales can be developed which will be of general applica- 
bility in job evaluation projects. 


Table 2 
Correlations between the Scale Values Derived in Plant A with Values Derived 
for Twenty-three Similar Jobs in Plant B 


Job Trait TAB 
Educational Skills 92 
Work Skills 92 
Application Skills 34 
Social and Personal Skills 91 


Further analyses of these data suggest high consistency in the judg- 
ments made by the several judges. Table 3 summarizes these findings. 
The coefficients reported in Column rır are average intercorrelations be- 
tween judges (5) and may be regarded as estimates of the reliability of the 


Method of Paired Comparisons 217 


i Table 3 
liability of the Judgments on which the Scale Values were Based 


Plant A Plant B 

Ti TAA Tu TAA 

805 982 941 993 

777 978 911 990 

623 956 826 979 

812 983 904 989 
Table 4 


Correlation between the Sum of Judgments Made by Employee and 
Management Representatives 


Plant A Plant B 
rs rs 


Job Trait 


Educational Skills 96 99 

~ Work Skills 93 98 

_ Application Skills 92 i 94 
N al and Personal Skills 95 91 


the raa Column are the estimates 
ing from applying the Spearman-Brown prophesy formula to the 

Using comparatively large groups of evaluators obviously 
highly reliable judgments. These coefficients compare quite 
ted for “point-evaluation” judg- 


än the literature (4, 6). Further, from the above it may be pre- 
ing committee were 


dual judge’s judgments; those in 


between the sums of employee an 
din Table 4. This finding would suggest, then, 
judges who participate in a job scaling project (if we can assume 
ere were differences in the attitudes of the members of our groups) 
ot likely to color their judgments of the jobs. This finding is con- 
t with the findings of other investigators (1, 3) who have studied 
le values assigned to opinion statements by judges who differ 
a cedly in their attitude toward the object being investigated. 
= Summary: The Method of Paired Comparisons 
e scaled on four traits by using the 


investigations’ jobs wer i sing 
The results of these investigations 


of Paired Comparisons. 


218 G. A. Satter 


indicate that jobs can be scaled on these dimensions and that the measure- 
ments yielded by such scales can effectively be used to order jobs in a 
fashion which is valid for rate setting. The findings further suggest 
that the method used results in scale values which are independent of 
the particular population of jobs chosen. 

At the practical level, the methods employed are particularly well 
adapted for industrial usage: (a) They permit the participation of large 
numbers of evaluators; (b) they can be employed with comparatively 
naive evaluators i.e., little training time is demanded; (c) even untrained 
evaluators report little difficulty in making the judgments called for; 
(d) the judgments can be made with a minimum of supervision and 
follow-up review; and (e) the resulting measurements are highly reliable. 


II, Construction and Use of a Scoring Key for a Job Specification Form 


In the same plants in which the investigations described above were 
made, trained job analysts collected and summarized other job data; 
they prepared specification forms which in form and content were some- 
what like the Worker Characteristics Form employed earlier by the United 
States Employment Service in its job studies (11). The items, of which 
there were eighteen, covered various aspects of the skills and knowledges 
required by the jobs analyzed. Each item was prefaced by a brief 
statement defining a particular skill (or knowledge) and this was followed 
by three or four alternative phrases or statements descriptive of various 
degrees of skill. These alternatives were drawn up arbitrarily to definite 
approximately equal distances along the skill scale. The following is a 
sample item: 


Responsibilities for planning and laying-out work. 


polls a. All work planned and laid out by the supervisor. 

pasa b. Particular class of tasks allocated to worker; lays out own schedule 
according to established routines. 

pee c. Works on a job basis but has the responsibility for setting up ow? 
work operations and schedule. F 

Neo d. Particular class of tasks allocated to worker ; responsible for setting 

i up own work operations and schedule. 


Collection of the Data. As in the case of the job descripti hon, 
0 i h e job description preparation, 
described in Section I, the ratings called for by the specification forms were 
made on a cooperative basis by the job analyst, the immediate supervisor, an 
D He fet Aon ER job. One hundred and three oes oe 
in 3 t asis 
for the analysis reported in this Dae Prepared eae 
Analysis of the Data. Collectively, the items of the job specification form 
cover the same subject matter that was dealt with in the two scaling projects 
described above, so it seemed reasonable to presume that the ratings reported 


* See Table 4, 


ee 


Method of Paired Comparisons 219 


fication sheets might be turned to the same usage as the paired 

,data—namely, to order jobs with respect to their worth. Accord- 
key was developed for these items. 

he 18 ratings for the 70 jobs in Plant A was correlated with “going 

n equation was written for combining the “scores” of the individual 

this equation, the individual item ratings were weighted in terms 

relation with rates and the reciprocals of their respective standard 


Table 5 


ns between the Items of the Specification Form and Going Rates and 
the Standard Deviations of the Individual Item Ratings 


Item on Specification Sheet r S.D. 


ial schooling demanded by the job 46 -60 
the use of numbers and numerical operations .55 .75 
e use of words—spelling and vocabulary 29 81 
reading 29 99 
training needed for the acquisition of job skill 08 . 1,94 
on the job 61 1.72 
l Kind of supervision received on the job .37 91 
R bility for planning and laying out work 72 82 

ility for making decisions 58 > 40 
ions under which work is performed —.03 
l nature of work—interesting, stimulating or routine and 


—.28 AL 

demands of the job —.04 39 

ion given to other workers 52 58 
ships with other workers on the job 82 51 
ships with persons outside the department 46 96 

in oral expression 46 472 
ity to maintain confidences oF oH 


ce and dress requirements 


check on the accuracy of the scoring key developed (i.e, the 
he key to reproduce the criterion on which it was built), the 70 


tion of the item weights a larger proportion of the criterion 
could have been accounted for. The operation of correlating 
ith the criterion on which the scoring key was originally built is, 
, no check of either the validity of the procedure nor its general 
s. Accordingly, a similar set of specifications, which was de- 
in Plant B by another group of job analysts, was scored with 
y developed in Plant A; again the resulting scores were correlated 
ae 


ie 


220 G. A. Satter 


with rates. In Plant B, specifications correlated .92‘ with rates. Thus, 
when an independent criterion and a new population of jobs is employed, 
the scoring key is found to be quite satisfactory. 


Summary: A Scoring Key for a Job Specification Form 


E The procedures used in the development of a scoring key for job 
specifications forms has been described. Such a scoring key was found 
to yield scores which are related to wage payments made to clerical 
workers. There is some evidence to support the conclusion that such a 
scoring key developed in one plant may be of general usefulness in 
evaluating similar jobs in other plants. 


Discussion 


The two approaches to job measurement described here may be com- 
pared and contrasted. As indicated above, they yield results which are 
very similar so that one’s choice between them would probably be 
governed by considerations other than one of accuracy or validity of 
measurement. First, it might be pointed out, the scoring key resulting 
from the application of Method Two, can be developed in a much shorter 
period of time primarily because the volume of data dealt with is much 
smaller; in contrast, the Method of Paired Comparisons, even when 
“short cuts” are employed, is always cumbersome. Further, with 
Method Two, once adequate job analysis data have been collected, it is 
a comparatively simple task to collect the judgments called for by the job 
specification; but, it must be borne in mind that judgments of this sort 
can only be made by persons who have very intimate contacts with the 
jobs for which they are preparing specifications. Training of the evalu- 
ators might, of course, overcome this limitation. 

It might be argued then, that the Method of Paired Comparisons is 
more suitable for those kinds of projects where: (a) it is desirable to make 
the sealing project a cooperative one with comparatively large judging 
groups representing all interests, and (b) where one has a minimum 
amount of time to devote to the training of the judging group. Apart 
from the fact that paired comparisons data are generally highly reliable, 
and that the method has a well-established theoretical basis, the above 
characteristics, in many industrial plants, would strongly recommend 
this method. 


‘Note that this coefficient is slightly higher than the one obtained on the initial 
a validation. The difference in these two values does not attain statistical Sig- 
icance. 


Method of Paired Comparisons 221 


other hand, it is the writer’s opinion, that the scoring-key 
‘be particularily valuable in certain special circumstances. 
devices have been developed, they may be of particular use- 
in those situations where a comparatively small number of new 
be slotted into an already established wage structure. Or, 
the manufacturing unit is so small as to make other more 
procedures impractical. The scoring-key method can easily 
a supplement to any of the commonly used job evaluation 


T 14, 1948. 
References 


L. W. The influence of individual attitudes on construction of an atti- 


scale. J. soc. Psychol., 1935, 6, 115-117. 

d, J. P. The method of paired comparisons as & psychometric method. 
Rev., 1928, 35, 494-506. 

y, E. D. The influence of individual opinion on the construction of an 
ude scale. J. soc. Psychol., 1982, 3, 283-296. 

M. Job evaluation of non-academic work at the University of Illinois. 


ppl, Psychol., 1948, 32, 15-19. 

T.L. Statistical method. New York: Macmillan, 1923, 

b, C. H., and Wilson, R. R. Studies in job evaluation. 6. The reliability of 
point rating systems. J. appl. Psychol, 1947, 31, 355-365. 

L., and Leukart, R. H. Job evaluation. New York: Prentice-Hall, 1948. 


L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 273- 
ck, R. S., and Richardson, M. W. Item analysis, Person. J., 1933, 12, 


154. 
eles, M. S. A psychologist looks at job evaluation. Personnel, 1941, 17, 165- 


and reference manual for job analysis. Prepared by the Division of Occu- 
Analysis, War Manpower Commission. Washington: U. 8. Gov. Print. 


e, 1944. 


The Effect of Equating Interest Test Items for Prestige Value 


Elizabeth Fehrer 
Brooklyn College 
and 


Hans Strupp 
U.N.R.R.A. 


A number of the currently used vocational interest scales require the 
subject to choose between pairs or groups of occupational titles or activi- 
ties! The assumption is made that degree of interest in a given field may 
be measured by the frequency with which one selects items that fall in 
this field over the other items with which they are grouped. 

Interest scores obtained in this way should be most reliable when 
the items are matched for all factors that might influence choice except 
interests or personal values. Where choices are required between oc- 
cupational titles, for example, preferences might at times be determined 
by such factors as the prestige of the occupations or their monetary return 
rather than by interest in a general type of work. Thus, if an item re- 
quires a choice between the occupations of United States Senator and 
scientific laboratory assistant, a person of high scientific interests might 
choose Senator because of its far greater prestige value. At the present 
time, however, there is no experimental evidence of the effect that factors _ 
such as these exert on interest scores? It is the purpose of the present 
study to determine the extent to which the factor of prestige can in- 
fluence such scores. The study originated in an attempted revision of 
the Allport-Vernon Study of Values (1). One type of item in the proposed 
revision consisted of pairs of occupational titles, the occupations being 
chosen to represent Spranger’s value categories. From these items @ 
person's Score for a value category was to be determined by the frequency 
with which the occupations representing the category were preferred over 
those with which they were paired. It was in connection with the con- 
struction of this part of the test that the question arose concerning the 

‘See, for example, the Kuder Preference Record (4), the Thurstone Interest Schedule 
(11) and the Occupational Interest Inventory (Lee and Thorpe (5). 

2 The advisability of holding such factors constant, however, has been recognized. 


For example, in the construction of the Occupational Interest Inventory (5), the activities 
in each item have been roughly matched for job level. 


222 


quating Interest Test Items for Prestige Value 223 


matching the occupations for prestige value. The experi- 
lescribed is an attempt to answer this question.* 

of the study involved the following steps: 

‘selection of occupational titles that fall into the Spranger categories. 


scaling of these occupations for prestige value by Thurstone’s method 
appearing intervals. 

construction of an interest inventory in which the items consisted 
occupational titles. From this scale, three separate scores could be 
d for each Spranger value category. For example, one of the aesthetic 
ee was to be based on items in which aesthetic occupations were 
with other occupations equal to the aesthetic in prestige value. The 
thetic score was to be computed from items in which the aesthetic 
tions were higher in prestige value than the occupations with which they 
red. The third aesthetic score was to be computed from items in 
ae aesthetic occupations were lower in prestige value than the occupa- 
which they were paired. If prestige influences occupational prefer- 
these three scores should differ significantly. 


only factors systematically varied in this study were the prestige 
f the occupational titles and the interest categories in which they 
. Other factors that might affect choice, such as financial 
social service value, were neither isolated nor controlled. It 
ed that the influence of such factors would be similar to the factor 
ige, as Anderson (2) has shown high correlations between these 


Method of Selecting the Occupational Titles 


‘he primary concern in the selection of the occupational titles was 

y could be unambiguously classified into the Spanger value 
Five interest categories were used: theoretical, economic, 
political, and social-religious. We decided arbitrarily to con- 
‘the social and religious values because (1) it was impossible to 
cient number of distinct occupations that fitted in the religious 
nd (2) both social and religious occupations seemed to involve 
istic and social-service interests and activities.* 

ll known that in certain situations prestige is an important determiner of 
e pioneer study of Moore (8) demonstrated that students’ preferences for 
‘expressions, ethical situations and musical dissonances were influenced by 
f expert opinion. Studies by Marple (7), Sherif (9) and others have con- 
re’s findings. In these studies, the opinion of experts was made explicit in 
ntal procedure by assigning one of the choices or statements to the authority 
‘In the present study, the factor of prestige operates in an entirely different 
inherent in the item. i : 
tional and factorial studies of scales measuring the Spranger values have 

d differences in the correlations between social and religious value scores. 
Wemberly and Mosier (12) report a correlation of .61; Ferguson, Humphreys 
(3) a correlation of .22. Lurie (6) reports a factor with high loadings on 


igi alues may be somewhat distinct, occu- 


Even though social and religious b à dist 
Hin the religious category seem to entail social service activities as well 


in spiritual values. 


224 Elizabeth Fehrer and Hans Strupp 


In searching for titles that could be classified in this manner, it soon became 
apparent that the list would have to be limited to professional, sub-professional 
and business occupations. These are the only types of occupations that seem 
to represent the Spranger values in an unambiguous manner. Skilled, semi- 
skilled, clerical and many other types of occupation had to be excluded. For 
example, the profession of artist seems clearly to fall into the aesthetic category 
whereas the occupation of house painter clearly fits neither the artistic, eco- 
nomic, nor any of the other categories. Consequently, the prestige range 
covered by the occupations chosen represents only a small fraction of the entire 
occupational prestige range. The range, however, is representative of that 
covered in certain existing interest scales. 


One hundred occupational titles were selected, 20 to represent each 
of the five value categories. 


Method of Construction of the Psychophysical 
Occupational Prestige Scale 


Thurstone’s method of equal-appearing intervals was followed in 
scaling the 100 occupational titles in respect to prestige value. 


Fifty students in an advanced undergraduate class in experimental psy- 
chology served as judges. The 100 titles were printed on separate cards, 
arranged in random order, numbered, and a set was presented to each judge 
with the instructions to sort them into seven piles with apparently equal 
intervals between them. A seven-step scale was used instead of the traditional 
eleven-step scale since it was believed that with occupations as homogeneous 
in respect to social prestige as the ones chosen it would be impossible to dis- 
criminate eleven steps. With this exception, Thurstone’s procedure was 
followed in determining the median and Q values for each occupational title. 
These values are presented in the second and third columns of Table 1. 


Table 1 
High, Median, Low and Mean Scale Values of the Occupational Titles 
in each Interest Category * 
Scale Values 
“Interest Category High Median Low Mean 
(BS RON SRS 7 RS a Ee i ire ae ei u 
Political 0.60 1.91 6.35 2.45 
Economic 1.75 4.45 6.45 4.43 
Theòreticat 1.72 3.34 5.22 3.37 
Aesthetic — 1.78 3.52 6.24 3.77 
Social-Religious 2.25 3.86 5.70 4.01 


* To reduce printing costs, Table 1 is presented here in greatly abbreviated form. 
The complete table, showing median scale values, Qs, and per cent unambiguous agree- 
ment in classification into value categories for each of the 100 occupational titles, bas 
been deposited with the American Documentation Institute. For the six pages in- 
volved, order Document 2624 from the American Documentation Institute, 1719 N 
Street, N.W., Washington 6, D. C., remitting $0.50 for microfilm (images 1 inch high 
m standard 35 mm. motion picture film) or $0.50 for photocopies (6 x 8 inches) read- 
able without optical aid. 


Equating Inierest Test Items for Prestige Value 225 


ion of Table 1 shows rather marked differences in the prestige 
of the occupations representing the various Spranger values. On 
, the political occupations ranked very high in prestige. The 
ranking occupations belonged in this category. The means 
median scale values for the occupations in each category are as 
ws: political, 2.45; theoretical, 3.37; aesthetic, 3.77; social-religious, 
onomic, 4.43. These values of course refer only to the occupa- 
lected in this study. 
e the political occupations ranked so very high in respect to 
this category had, later, to be excluded. There were simply 
gh low-ranking political occupations to pair with others in con- 
g the interest scale. 
Q values (double the usual semi-interquartile range) of the 
ini fs . The 
biguity value was 1.72. These Q values are large compared 
ose found in the construction of an attitude scale. The size of 
undoubtedly a function of the homogeneity of the items, but at the 
e it also represents the fact that in our culture, professional and 
occupations do not fall into a strict hierachy in respect to 


n the construction of the interest scale, occupations were not dis- 
‘on the basis of high Q values. The justification for this procedure 
the scores for each interest were to be based on a fairly large 
of items. Consequently, individual differences in susceptibility 


prestige of the individual items should cancel. 


Method of Checking the Accuracy of Classification 
of the Occupational Titles 


rder to check on the accuracy of the classification of the 100 occupa- 
titles into the Spranger value categories, 50 ‘advanced undergraduate 


ts were asked to sort them into the five categories. They were given & 
dom order, descriptions of the five interest 
d Allport (13. and they were asked 
each title in one of the interest categories whenever this was possible. 
ed to as unambiguo' classifications. If an 
f the categories, it could be placed in each. 
ed to indicate whether it seemed to fit 
hether its placement into one seemed 
Such classifications will be referred 
din Table 1 in the 


bjects placing the occupations in 


nated value category- 
e results indicated that there was fairly close agreenemt among 
ion of the majority of the 


ects concerning the proper classificat 


226 ; Elizabeth Fehrer and Hans Strupp 


Eighty of the occupations were placed in the same interest category 
by at least 80% of the judges. Of these 80 occupations, 67 were un- 
ambiguously placed in the same category by at least 80% of the raters, 
That is, at least 80% of these raters indicated no coordinate category for 
these 67 occupations. Thirteen additional occupations were placed in 
the same interest category by at least 80% of the judges, but a small 
proportion of the raters indicated a second coordinate category into which 
the occupation might also fit. These 13 occupations are designated by 
an asterisk, The remaining 20 occupational titles yielded less than 80% 
agreement and were therefore eliminated in the construction of the in- 
terest scale. These 20 occupations are designated by a double asterisk. 

The eighty occupations that met the criterion of 80% agreement in 
classification were distributed as follows among the five value categories: 
16 theoretical, 19 economic, 17 aesthetic, 14 political and 14 social- 
religious. 

Method of Constructing the Interest Scale 


The two preliminary steps provided facts concerning the prestige 
values of the occupational titles and the degree to which each fitted the 
modified Spranger categories. The next and major task was to con- 
struct an interest inventory from which it would be possible to determine 
whether the prestige value of an occupation is a factor that will influence 
interest scores. 


„After eliminating the occupational titles in the political category and those 
titles that did not meet the criterion of 80% agreement in classification by the 
judges, 66 titles were available for constructing the scale. y: 

The completed inventory was composed of 120 items, each item consisting 
of two Sepeeenal titles. Sixty of the items contained titles of equal prestige 
value, equal prestige value being defined as a difference of less than 0.50 points 
on the seven-point prestige scale. The mean difference in prestige value in 
these items was 0.19 points. The remaining 60 items contained titles that 
differed from each other by .60 points or more in prestige value. The mean 

repancy for these items was 1.53 points. 
e inventory was constructed in such a way that the following four scores 
could be computed for each interest category: 


1. An Equal Score. This score was derived from the 60 items in which the 
restige values of the Occupations making up the items differed by less than 
.50 points on the 7-point prestige scale. In this part of the scale, each interest 

category was compared with each other category ten times. That is, for 

example, 10 items involved comparisons of T and E titles; 10 involved com- 

Seabee of T and A titles; 10 involved comparisons of T and SR titles, ete 
he maximum possible equal score for an interest category was 30. p ; 

2. A Favored Score. This score was derived from 30 of the 60 items in 

which the prestige values of the occupations differed by more than .60 points 
on the prestige scale. Here each interest category was compared with every 
other 5 times. The maximum possible favored score was therefore 15. |, 

3. The Opposed Scores. These scores were derived from the remaining 

30 items in the same manner as the favored scores except that here prestige 
operated against the selection of the Occupations in an interest category. 


Equating Interest Test Items for Prestige Value 227 


al Scores. The unequal score was the sum of the favored and 
and Parent the score for an interest category derived from 
which the prestige values of the titles differed. The maximum 
score for an interest is 30. If poa influences occupational 

the unequal interest scores should be more alike than the com- 
scores derived from items in which the prestige factor is held 


Administration of the Interest Scale 


items were arranged in random order, mimeographed, and the 
was administered to 275 students in first-year psychology 
the completed inventories, 180 were chosen for analysis, 90 
and 90 from women students. For each student the 16 scores 
been described were determined, namely: 1. T, E, A and SR 
2. T, E, A and SR favored scores; 3. T, E, A and SR op- 
and 4. T. E, A and SR unequal scores. 


Analysis of Interest Scale Results 
types of analysis were undertaken to determine the effect of 
lue on occupational choices. The results of the three analyses 
ely consistent and all three show that prestige has no effect what- 
the interest scores. 

he first type of analysis involved computing correlations between 
ual, favored, opposed, and unequal scores for each interest category 


determine whether scores based on the various types of items 
ement. These correlations are shown in Table 2. They have 


ited separately for men and women. 


Table 2 
tions Between the Equal, Favored, Opposed and Unequal Scores for 
Each Interest Category 
- Theoretical Economic Aesthetic Social-Rel 
Men Women 


Men Women Men Women Men Women 


s on the equal and unequal scales are highly consistent. For 
equal and unequal T scores correlate .93, the E scores, .95 ; the 
87 and SR scores .87. The corresponding correlations for 
.89, .90, .87 and .84. Scores on these scales are based on 30 


228 Elizabeth Fehrer and Hans Strupp 


It is apparent that these interest scores are highly consistent whether 
based on items in which prestige cannot contaminate the scores or on 
times in which prestige might exert some influence on vocational choice. 
These high correlations also indicate high reliability for the scales. 

The correlations between the favored and opposed scores are lower, 
ranging from .47 to .80. It must be remembered that these scores are 
based on only 15 items. 

The method of correlation will show only whether scores on two 
scales are similar in rank order. It does not indicate whether one set of 
scores is numerically higher than the other. In order to determine 
whether the various scores for an interest category were directly com- 
parable, a second type of analysis was undertaken. 

2. The second method of analysis consisted of directly comparing the ’ 
interest scores based on the equal, favored, opposed and unequal scales. 
To facilitate comparison, the raw scores were first converted into per- 
cents.’ These percent scores then represent the proportion of times that 
titles in the interest category under investigation were preferred to the 
titles with which they were compared. These scores are presented in 
Table 3. The scores for men and women are presented separately as 
the two sexes differed in their preferences. 


Table 3 
Equal, Favored, Opposed and Unequal Percent Scores for Each Interest Category 


Theoretical Economic Aesthetic Social-Rel 


Scores Men Women Men Women Men Women Men Women 
43 36 60 31 49 66 48 67 
Favored 44 36 65 33 46 58 48 55 
Opposed 39 39 60 37 49 70 49 73 
Unequal 42 37 63 35 47 64 49 64 


a a L E OAS S S 


It is evident that the factor of prestige has no significant effect on 
these percent scores. For the 90 men, for example, we find that the T 
titles are chosen 43% of the time from the equal scale, 44% of the time 
from the favored scale, and 39% of the time from the opposed scale. 
None of the differences are significant. 

Again, the SR titles are chosen 48% of the time from the equal scale, 
48% of the time from the favored scale and 49% of the time from the 
opposed scale. Again the differences, this time in the opposite direction, 
are not significant. The results for women are comparable. 


ë It should be remembered that the maximum raw equal and unequal scores were 30 
whereas the maximum raw favored and opposed scores were 15. 


quating Interest Test Items for Prestige Value 229 


cal ratios presented in Table 4, only three are significant 
e differences are not in the expected direction. For example, 
Significant difference (C. R. equals 4.18) between the equal 
d SR scores for women. Reference to Table 3, however, 
į the equal percent score is 67 and therefore higher than the 
ore of 55. 


Table 4 


ical Ratios Between Equal, Favored, Opposed and Unequal Scores 
for Each Interest Category 


Theoretical Economic Aesthetic Social-Rel 


Men Women Men Women Men Women Men Women 


0.48 0.69 1.23 0.50 0.76 0.12 1.20 
0.03 1.29 0.66 0.83 2.43 0.18 4.18 
0.87 0.07 1.75 0.02 1.07 0.37 2.10 
0.83 1.22 1.05 0.84 3.48 0.53 6.32 


attributed to the factor of prestige as there is no general 
‘for the favored scores to be higher than the equal scores nor 
; generally higher than the opposed scores. Differences in the | 
cores are presumably, due to (1) chance factors and (2) the 
ar titles that occur in the equal, favored, and opposed scale items. 
1 no analysis has been made of this last factor, it is obvious that 
seupations are generally more popular than others. This factor 
ontrolled in the construction of the scale. oe 
e third type of analysis consisted in computing for each indi- 
susceptibility to prestige score. This score was obtained from 
items in which the prestige values of the occupations composing: 
differed. The score consists simply of the number of favored 
ions selected minus the number of opposed occupations. The 
m possible range of this susceptibility to prestige score 18 from 
to plus 60, a positive score representing susceptibility to prestige. 
an susceptibility to prestige score for men was zero; for women, 
It is obvious that there is no tendency to favor occupations 


a Summary 
esults of these three analyses seem to show clearly that, insofar 
tudents are concerned, preferences for occupations within the 
st udied here are not determined by the prestige which is accorded 


230 Elizabeth Fehrer and Hans Strupp 


to that occupation. Although differences in prestige are recognized, as 
is demonstrated by the fact that occupations can be scaled for this vari- 
able, occupational preferences are not determined by this factor. In- 
stead, preferences are apparently determined by far more basic interest 
patterns, and, as the consistencies of the scales suggest, these interest 
patterns constitute a relatively stable component of the personality. 
This, of course, has often been pointed out by Strong (10). 

It may, in addition, be safe to conclude that in the construction of 
professional interest scales when occupational titles make up the items, 
the prestige values of the items may be ignored. 


Received October 8, 1948. 


References 
1, Allport, G. W., and Vernon, P. E. A study of values. Boston: Houghton Mifflin 
& Co., 1930. 
2. Anderson, W. A. Occupational attitudes of college men. J. soc. Psychol., 1934, 5, 
435-465. 


. 3. Ferguson, L. W., Humphreys, L. G., and Strong, F. W. A factorial analysis of 

interests and values. J. educ. Psychol., 1941, 32, 197-204. 

4. Kuder, G. F. Preference record. Chicago: Science Research Associates, 1942. 

5. Lee, E. A., and Thorpe, L. P. Occupational interest inventory, advanced series. 
Manual of directions. Los Angeles: California Test Bureau, 1943. 

6. Lurie, W. A. A study of Spranger’s value-types by the method of factor analysis. 
J. soc. Psychol., 1937, 8, 17-37. 

7. Marple, C. H. The comparative susceptibility of three age levels to the suggestion 
of group versus expert opinion. J. soc. Psychol., 1933, 4, 176-186. 

8. Moore, H. T. The comparative influence of majority and expert opinion. Amer. 
J. Psychol., 1921, 32, 16-20. 

9. Sherif, M. An experimental study of stereotypes. J. abnorm. soc. Psychol., 1935, 
29, 371-375. 

10. Strong, E. K. Vocational interests of men and women. Stanford Univ. Press, 1943. 

11. Thurstone, L. L. Thurstone interest schedule. New York: The Psychological Cor- 
poration, 1947. 

12. Van Dusen, A. C., Wimberly, S., and Mosier, C. Standardization of a values 
inventory. J. educ. Psychol., 1939, 30, 53-62. 

18, Vernon, P. E., and Allport, G. W. A test for personal values, J. abnorm. soc. 
Psychol., 1931, 26, 231-248. 


eference Differences among Occupational Groups 


Mary F. Mosier 
eau of Naval Personnel, Navy Department, Washington, D. C. 


and 


G. Frederic Kuder 
Duke University 


tudy was conducted in order to explore the occupational pat- 
ng from the use of a new measure of preference developed 
the authors. The mean scores of twenty occupational groups, 
‘om unskilled labor groups to highly professionalized vocations, 
apared with the average scores of a group of unselected men. 
also been investigation of the differences in mean scores among 
eupational levels. The three occupational levels are approxi- 
hose included in the major occupational groups of the Dictionary — 
ional Titles (1), as 0—Professional and Managerial occupations, 
and Sales occupations, 4 through 7, Skilled and Semi-skilled 
ms. Results reported here are to be considered as suggestive, 
an conclusive, since an earlier abbreviated, and less reliable, 
the test was used in making the study. } 

ew measure of preferences, entitled Preference Record—Personal, 
of five scales. Each of the scales has been developed so as to 
h correlations among the items comprising a scale, and low cor- 
among the scales. The content of the scales may be described 


reference for taking the lead and being in the center of activities 
ving people. 3 
erence for dealing with practical problems and everyday 
airs rather than interest in imaginary or glamorous activities. 
erence for thinking, philosophizing, and speculating. : 

erence for pleasant and smooth personal relations which are 
from conflict. 

ference for activities involving the use of authority and power. 
oted that the scales are based on recorded preferences. There 
tion intended that the scales measure actual facility in the 


231 


232 Mary F. Mosier and G. Frederic Kuder 


The sample of unselected men is composed of the first 450 respondents 
to the random sampling as described in the Manual (2). The only cri- 
terion for inclusion in this group was that the test blank be filled out in 
accordance with the instructions. For the sake of convenience we shall 
hereafter refer to this population of unselected adult males as the “base 
group.” 

Our occupational samples have been drawn from an additional group 
of more than 1000 returned Preference blanks obtained through the 
original sampling referred to above. Information as to the vocation of 
the subject was obtained from the personal data section of the test blank 
wherein he was requested not only to name but also describe his work. 
This double requirement of the subject that he both name and describe 
his employment made it possible to check doubtful titles and descriptions 
against those of the Dictionary of Occupational Titles, and thus increased 
the validity of our occupational groupings. For example, when job 
descriptions were checked we frequently found the title “Accountant” 
self-conferred on a “Bookkeeper.” 

The count and classification of the occupations represented in our 
sample of more than 1000 revealed more than fifty different occupations 
reported by our subjects. It was decided that those occupations rep- 
resented by twenty or more cases could be analyzed for the purposes of 
exploration. Accordingly, the test blanks of the members of the twenty 
such occupations were isolated for study. Table 1 lists the occupations 
grouped according to the appropriate level. There were 577 cases rep- 
resented in the twenty occupations which could be grouped as follows: 
Professional and Managerial, 10 occupational groups with a total of 
298 cases; Clerical and Sales, 5 occupational groups and 130 cases; and 
Skilled and Semi-Skilled, 5 occupations and 149 cases. Table 1 gives 
the number of cases for each occupational group. 


Procedure 


Mean scores and standard deviations for the base group of unselected 
adult males were computed for each of the five scales of the Preference 
Record—Personal. For each occupational group, only means were com- 
puted since, for purposes of testing the significance of mean differences, 
the variance of the unselected group was considered a much better esti- 
mate of the population variance than the variances obtained from the 
comparatively small occupational groups. Comparisons were then made 
between the means for a particular occupation and those of the base group: 
By such comparison we could observe the difference between the average 
member of an occupation and the average member of a general group 
with reference to the scale in question. In order to observe the effects 


Personal Preference Differences 233 


Table 1 


ale Scores for Base Group, Occupations and Occupational Levels for 
Preference Record—Personal 


CN A B Cc D E 
2 e že z & 2 
450 10.0641 13.35 3.2 11.67 35 18.68 52 14.16 39 


298 10.1 13.4 12.1 19.8 15.0 
35 10.0 13.0 12.1 19.1 15.1 
25 94 13.2 13.0(+) 17.8 16.4(+) 
21 99 14.7(+) 11.6 20.0 15.3 
Engrs. 29 9.7 13.9 15° 2L1(#) 147 


21 8.3(—) 13.2 13.0(+) 21.2(+) 15.0 


23 11.6(+) 14.1 14.3(+) 20.1 16.4(+) 
2 95 13.0 12.1 18.9 14.3 
56 10.3 13.4 11.5 19.3 13.7 
26 11.5(+) 11.8(—) 126 18.8 16.5(+) 
38 10.0 13.9 11.1 21.4(+) 14.8 
130 10.3 12.9 12.0 19.3 14.4 
and Tellers 35 8.7(—) 13.4 12.6 19.0 14.1 
22 96 13.1 11.1 20.4 13.6 
7 23 11.8(+) 13.3 11.8 21.7(+) 144 
other than to 
27 12.8(+) 114(—) 17 18.9 149 
to Consumer 23 8.7(-) 183 12.5 17.0 15.4 
Semi-skilled 
149 9.9 13.6 11.3 18.4 18.4 
24 79(—) 143 10.2(-) 17.4 13.2 
5 33 10.3 13.6 11.2 18.5 14.0 
Workers 50 10.4 13.4 11.4 18.8 12.3(—) 
n 20 11.6(+) 13.2 11.4 18.6 14.9 


8.5(—) 13.4 12.4 18.1 13.7 


we have also computed the mean 
each of the occupational levels on the five scales, and studied 
ences found between the three major groups. The purpose of 
e was to determine the relation between enjoyment of a 
e of activity and the position in the occupational hierarchy. 


le, are the average scores of professional people higher on the 
g to preference for use of authority than those found for 
employees indicate less 


e trades? Or, do clerical and sales 


234 Mary F. Mosier and G. Frederic Kuder 


favorable attitudes toward taking the lead than do professional and 
managerial? 

The significance of differences obtained was estimated in terms of the 
standard error of the difference between means. However, the standard 
deviation used in computing the standard error was that of the base 
group, rather than that of the occupational group in question, since this 
appeared to be a more accurate determination of the standard deviation 
of the universe than would any single sub-sample. The N in each such 
comparison was that of the occupational subsample.' 


Results 


The means and standard deviations for the base group sample are 
shown in Table 1. Also in Table 1 are shown the mean scores for the 
three occupational levels, (1) Professional & Managerial, (2) Clerical & 
Sales, (3) Skilled and Semi-Skilled Trades. These three levels were 
compared with each other rather than with the base group of unselected 
men. Inspection of Table 1 indicates substantial differences between 
the base group and one or another of the three occupational levels did 
occur on several scales. However, the significance of these differences 
has not been computed. In Table 2 we have listed differences between 
the mean scale score among the three occupational levels, and for those 
differences found to be above the 5% level, the critical ratio is shown in 
parentheses. Ñ 


Discussion of Results 


Any interpretation of results found in the study must be prefaced by a 
reminder of the small size of the occupational samples. Analysis of the 
data scale by scale yields information about both the scale and the atti- 
ee of the various occupations toward the activity embraced by the 
scale. 

Scale A. The items on this scale relate to a preference for taking the 
lead and being in the center of activities involving people. Inspection 
of Table 1 shows that more significant differences between occupational 
groups and the general population sample occurred on this scale than any 
other. The mean score for ten occupational groups differed significantly 
from that of our base group sample. Five occupations indicate highly 
favorable attitudes toward the activities involving social leadership: (1) 
Personnel & Counseling Workers; (2) Sales Managers; (3) Insurance 
Salesmen; (4) Salesmen other than to Consumer; and (5) Foremen. The 


1 The formula used was: SEpur. =e + m, where o, is the standard deviation 
of the base group. 


Personal Preference Differences 235 


tions showing less-than-average enjoyment in these activities 
Office Managers; (2) Accounting Clerks & Tellers; (3) Salesmen 
msumer; (4) Carpenters; and (5) Telephone Linemen.. Two of 
upations which showed less than average interest in leading 
‘or being in the center of a social situation are at first glance 
We refer to the Salesmen to Consumer group and the Office 
er group, both of which might be expected to indicate average or 
average enjoyment of people. An examination of the personal 
ad job descriptions of members of these two groups throws some 
‘their attitudes not indicated by their job titles. We find among 
en to Consumers, a large number of individuals who report 
handicaps or old age or failure in some other line of work. A 
ted generalization would seem to be that this group of salesmen 
select their vocation but were forced to it through inability to 


with people. Rather they list as their duties “ordering supplies,” 
tking incoming orders,” “making work schedules,” “reviewing re- 
coordinating.” After checking these job descriptions there is 
ystery in the responses these two groups have made on Scale A. 
he three occupational levels do not differ significantly in their pref- 
for taking the lead and being in the center of things involving 
Preference for these activities characterize skilled and semi- 
trades as much as professional and managerial occupations. That 
led group scored as high as it did may be attributed to the signifi- 
; gher mean score for the 20 foremen. 
B. The content of this scale relates to a preference for activities 
practical nature, rather than imaginary or glamorous pursuits. 
‘Comparison is made between the various occupational groups 
€ base group, we find in Table 1 that there were three occupational 
Owing significant differences. Chemical Engineers show a strong 
tence for practical activities, while Sales Managers and Salesmen 
‘than to Consumer are less interested in these matters than is the 
man. A number of occupations which might be expected to 


igh preference scores failed to do so for these samples. A glance 


2 shows that the group of occupations labelled “Skilled & Semi- 
ificantly higher than that of 


Trades” have a mean score that is sign’ i 
-& Sales occupations. No other significant difference between 


uupational level groups was found for Seale B. 


236 Mary F. Mosier and G. Frederic Kuder 


Scale C. This scale may be described as preference for “thinking”— 
thinking of a philosophical or speculative nature. Significantly high 
mean scores were found for three occupations: Business Managers, 
‘Account Clerks and Tellers, and Personnel and Counseling Workers. 
The only significantly low mean was that for Carpenters. While the 
results for Personnel Workers and Carpenters may be in accordance with 
the hypothesis concerning the trait measured, those for the other two 
groups are not. Moreover, the lack of high scores for Chemical and 
Mechanical Engineers and Teachers, all of whom deal with abstract 
and conceptional ideas, does not seem consistent with the identification 
of this scale as thinking philosophizing, and speculating. More cases 
and data on additional occupations are needed before this trait can be 
definitely identified. 


Table 2 
Differences between Means of Three Occupational Levels on the Five Scales of 
Preference Record—Personal * 
————————— 
Prof. and Man. Cler. and Sales 
Prof. and Man. Minus Minus 
Minus Sk. and Semi-Sk. Sk, and Semi-Sk. 
Scale Cler. and Sales Trades Trades 
(es TSR oe einen! gg Aira L mMm 
A —.193 209 -402 
B 516 —.150 —.666 (1.95) 
c .103 820 (2.37) 717 (1.94) 
D .422 1.382 (2.67) .960 (1.74) 
E .579 1.668 (4.32) 1.089 (2.65) 


a a E E lh sam 
* Figures in parentheses represent the critical ratio of the significant differences 
found between means. 


Table 2 indicates that preference for activities measured by Scale (0j 
is related to occupational level. We observe that Professionals are higher 
than Clericals, but this difference is not significant. However, both 
Professionals and Clericals are, on the average, more favorable toward 
these activities than is the average Trade worker, and this difference is 
found to be beyond chance expectations. 

Scale D is designed to measure the individual’s preference for activities 
of an agreeable nature—activities free from conflict. The occupational 
group showing the highest enjoyment in pursuits of an agreeable nature 
was Insurance Salesmen, consistent with the stereotype of this grouP 
as highly amicable. The next three groups, in order of mean score, were 
Teachers, Office Managers and Mechanical Engineers. No occupational 
group yielded a mean score significantly lower than that of the base brouP- 

We find here on Scale D differences among the occupational levels in 


Personal Preference Differences 237 


ean scores. The average Professional is higher than the Clerical 
gi o our figures, but the difference is small enough so that it can 
buted to chance factors. However, we find the critical ratio of 
nce between Professionals and Trades of such magnitude as to 
very significant, and the differences between Clericals and 
significant. We must assume that the average man working in 
‘or semi-skilled trades can be expected to be considerably less 
ed in activities of a pleasant, amicable nature than white collar 
‘bred men. 
items relate to enjoyment of the use of authority and power. 
n scores that show important and real differences from the mean 
e group are to be found only in the Professional & Managerial 
Business Managers, Personnel & Counseling Workers and Sales 
say that they certainly do like to exercise power. Lawyers 
found to be significantly high on this scale in a study made 
e The only group which indicated less than 
atisfaction in these activities were Factory Workers. It is 
appropriate to d 
1 Workers.” These were respondents to the random sampling 
ated they worked in a plant or manufacturing concern and who 
ed their jobs by mentioning a single simple function repeatedly 
ned, such as one phase of an assembly procedure. The group 
ogeneous in that a great many different industries are repre- 
ut they are similar in that their work was described as taking 
1a factory where they performed a single mechanical or motor act. 


e of skill ranged from unskilled to semi-skilled. 
nal level indicates that the 


ated to position on the occupational 
ial Workers are higher than Clerical & 
Workers, in general, in their preference for pursuits using authority. 
critical ratio, however, is only 1.43. A difference as large as this 
s direction could occur about 8% of the time through the operation of 
a But Professionals are sufficiently higher in their mean score 
des that we can say the difference is too great to have arisen by 
ore than once in a thousand. The difference between Clericals 


is point has been from & “vertical” 
i of the scales. 


tal” view of the results presents somewhat different informa- 


ut the occupations and appears worthwhile. ; 
ere five of the twenty occupational groups which showed no 


nt deviation from the mean of our base group on any one of the 


238 Mary F. Mosier and G. Frederic Kuder 


five scales. These occupations were Accountants, Plant Managers, 
Retail Managers, General Office Clerks and Electricians. With reference 
to the qualities measured by the Preference Record—Personal these people, 
or occupational groups, appear to be typical of the general population. 
It is interesting to note that all three occupational levels are represented 
in this group of occupations which show the same characteristics as the 
general population with regard to the variables measured. A study of 
larger groups may, of course, reveal significant differences. 

On the other hand, we find three occupations differed in their means 
from that of the base group on three of the five scales. These occupations 
showing atypical pattern were Personnel & Counseling Workers, Office 
Managers, and Sales Managers. These three occupations can be said to 
differ from the average man more than any of the other seventeen occupa- 
tions with regard to the characteristics studied. 

Of interest also are the results for the three occupational levels. We 
found no important differences between the two higher levels on any of 
the five scales. However, the Skilled and Semi-skilled group, as shown 
in Table 2, shows rather marked differences from the other two groups, 
Professional & Managerial and Clerical & Sales. 

In comparing the highest occupational level with the lowest we see 
that the preferences of Professionals and Managers are higher than those 
of Skilled and Semi-skilled trades workers for activities involving ‘philo- 
sophical thinking” (Scale C), pleasant relations (Scale D) and the use of 
authority (Scale E). 

Comparison of the group of Clerical & Sales occupations with those 
of the Skilled & Semi-skilled trades workers, reveals the same differences 
as those described in the paragraph above plus a difference in the opposite 
direction on Scale B. On Scale B, preference for activities of a practical 
nature, we observe in Table 2 that the trades occupations show a signifi- 
cantly higher mean score than the clerical group. 


Summary 


This study indicates that each of the five scales makes some dis- 
criminations by occupation, and that there is a relation between some 
occupations and the characteristic measured by each scale. 

._ Fifteen occupations differed on one or more of the scales. Five of 
the occupations did not differ from a general population sample on any 
of the scales, 

We found that differences also occurred that are related to the level 
of the occupation in the economic or educational scheme. These differ- 
ences were rather large and significant between the group of occupations 


i Personal Preference Differences 239 


silled & Semi-Skilled Trades and the group, Professional & 
„and also the group, Clerical & Sales. 


References 


ites Employment Service, U. S. Department of Labor, Dictionary of Occu- 

al Titles, Part I. 

Frederic, Manual for Kuder Preference Record—Personal. Cheas 
Science Research Apopa: 1948. 


The OL Key of the Strong Test and Drive at the 
Twelfth Grade Level * 


Stanley R. Ostrom 
Depariment of Public Instruction, Dover, Delaware 


One of the baffling problems facing educators today is that of finding 
an instrument that will determine, with an acceptable degree of accuracy, 
which pupils possess the pattern of traits that enable them to make the 
best use of their abilities. If an instrument could be found that made it 
possible for a counselor to distinguish subjects whose backgrounds and 
native endowments were such that they could easily be activated to exert 
a maximum of energy from subjects whose backgrounds had pre-disposed 
a more lethargic set, it would be possible to predict scholastic and voca- 
tional success much more accurately than is now the case. 

The Occupational Level Key of the Strong Vocational Interest Blank 
for Men has been recommended by Strong (6, p. 195) and Darley (1, 
p. 60) as an instrument that will enable a counselor to make this distinc- 
tion. Kendall (3) and Ostrom (4) have demonstrated that the OL key 
of the Strong Blank can be used with considerable confidence for this 
purpose at the College Freshman level. This paper reports an attempt 
to determine the utility of the OL key at the twelfth grade level. 

Two hundred twelfth grade boys enrolled in four Central New York 
high schools formed the sample. One-half of these boys cooperated in 
an intensive study and the total group participated in a study which 
utilized their academic aptitude as measured by the American Council 
on Education Psychological Test scores, drive! as measured by the OL 
key, and four year academic grade averages. `° 
: The 100 boys who cooperated in the intensive study were selected 
in the following manner: from three of the four high schools a total of 
sixty boys were chosen so that twenty of them had very high scores on 

* This paper is one of a series Teporting research in tools and techniques of counseling 
conducted at the Psychological Services Center at Syracuse University. It is a portion 
of a paper submitted as a Doctor’s Thesis under the direction of Dr. Maurice Troyer 
in partial fulfillment of the requirements of the degree of Doctor of Education in the 
School of Education, Graduate Division of Syracuse University, 1948. Other advisers 


to whom the writer feels deeply endebted are Dr. Milton E. Hahn, Dr. William E. 
Kendall, Dr. ©. Robert Pace, and Dr. Eric Gardner. 
1 For Purposes of simplicity, the pattern of traits discussed in the first paragraph will 
be represented in subsequent pages of this report by the term drive. 
240 


O L Key of the Strong Test 241 


;wenty of them had very low scores, and twenty of them had 
at clustered around a scaled score of fifty. Thus, three groups 
differentiated by OL were obtained. In the fourth high 
boys were chosen in such a manner that their OL scores fell 
uum. This was done to determine whether or not spuriously 
mships between OL and the experimental variables would be 
ad in the other three high schools due to sampling methods. 
ree new instruments were devised for purposes of checking on the 
ofthe OL key. These instruments were: (1) a Teacher’s Rating, 
Open End” interview, and (3) a “Guess Who” questionnaire. 
cher’s Rating (see Figure 1) was produced to measure drive in 
following areas: (1) drive for hobby satisfaction, (2) drive for 
stic achievement, (3) drive for co-curricular achievement, and (4) 
or vocational attainment. Each of the four traits was measured 
-point scale with discrete descriptions utilized for each of the 


second instrument, a “Guess Who” questionnaire (see Figure 2) 
ed in an attempt to determine how the young men felt about 
ve and persistence of their peers. In this instrument ten de- 
ms were listed with space provided where each subject could name 
ee of his peers who best signified the quality required of each 
it; 3 
third instrument was the interview (see Figure 3). The writer 
wed each individual, making use of the basic set of ten questions. 
bject was permitted to elaborate on each question as much as he 
From time to time, secondary questions were asked to en- 
the subject to enlarge on the response given to the primary ž 
. By means of the ten questions, the writer attempted to elicit 
e subject information from his background, his past school, 
nd hobby experiences as well as his hopes and plans that gave 
ice of the presence or absence of drive.* 
as necessary to quantify the results of th 
ey could be of any value in determining 
tults and those of OL. 
Teacher’s Rating Scales were fille 
each boy for at least one year- E: 
point scale, hence the maximum score 


two week interval. 
eek interval. 


e three new instruments 
relationships between 


d in by five teachers who had 
ach trait was measured on a 
obtainable on each trait 


89 +: .03, N = 40. Test-retest method with 
94 + 02, N = 40. Test-retest method with two w 


| measure of reliability determined. 


wang uo 
PHOS ayy 728,, 0) ansap 
‘Burm e sey “eH 


DE + 
queulsousyua pruosIad 10 ‘£4308 0} SUONMNGHJUOI ‘sIsqY}O 0} BIAIS 


oe sod mod aajoaut sjy 'uonisod afqejdooo8 Aypetoos pus Furled [oa 8 UIE}}E 0} Ins :FUOUTUTEWY [BUOHIOA 10} AAA “AI 


3 


*Ayyiqrsuodser Apez jo 398} ur paza “ysuuetul 8,y09fqns 
“sopwsqo snotes jo 9auBap Ysry ÁləA -197W SI OY YOTYM UT SFT) -sənnp souu 20} poq 0} uarogns oq 
swig ‘suweg ata “poysy p -tanos ur agaaossod pra  Ayyiqrsuódsas sownsey ABU Ayianow ut ae See 
oid jo Joqual s19930 Ul ow pus “SOMIANOW J JO | 0} `ssəsons Əmsur 0} 13) apoun uonedon sprs}nio zuwysuoo 
Spi raed algai uon r antilla uonedopred unjoa ut ssoustiedxe ome} -38d tuoo msu ped peace st uonedon 
“SuTMANEP TIM BLOM — TAL “IeowoRUY [woods Huy] “AaygisNods porinan 1ouyur puo suon OF Eossooou Ajunsi penupuoo AmA 
'Sonlanou ooyos sey oy qorqa 10} MO -91 Sy MILA -Wand ausopou əmpuə sı UOUTEHWINOOUE IUOS ou oumssy prias 
ur ABi0U0 Jo zunoure -tanow p uopeunn 'sonianos sepmoumo-o0 MMAM Snus jooqos ‘SOLAN’ dno Of puestanos joqumu SANs 
snopusme © sarg -194əp salon 20} a GRE, £39.14 qmd oyeg, porrun sy sopediona Paaran © Spoed popan ur Siedionaed SEAS 
5 “9 i g x y t * 


L 
sSupeyo sword Ajos Jo usad By} OpIsyNO SLANI [OOYDS Ur OAETTE 0} ISAC] :;WOMTeASTYOY IEMIINI-0V 107 ANA “TT 


eantyos 07 Sunduay tq yaxa 0} "uon -uon puosòq syrom ATTEN], *ex0000} 
qu osuo}oId oyr RYU st oy FT sı so um inn mene ‘umo STY uo PAS] 94w BAYOU umo yän} -Appvd 0) wos} uowtadns ou 
əqjuowəJemoous YON -Yove, Aq Fursin yny -q0 10umu gunounne -pour e ye OASTYON TIA, poajos Aprunsn aqu syswg ur siayoway Aq soimbey 'poureye 
a et a pao oa aa a a tie ape ay cask aay C end he iomo of wey far uao GAONT CIOSY -ESA pa or 205 esos 
ce es Ce aek HOES and MA =qoad ymogip oaos TIAL -npe ams, geod M erg e ShOnpre ecsoaeAP mas. *NouIp aaoo hos 
syoafqns jooyəs m ssəəns aAaTTOR 0} asaq :JuəwəaərqIy WySBjOyIS Joy ðA “IT 
‘oanyny 
JBUONVOOA sq ut ojos 
ben! p quma erg 30 aoo on oore syy zom 03 pre B, sapndjei i 
-OH oy prsiad vawq -msqo snoras oUIODI9A0 ur sdworpuwy əmpuə “esQ00ns OASTYOR  ‘asuw səjowyəqo Iou moos Een r 
qey} sorqqoq Z 10 T Sey mas ~amgur SOT OAwY MAA Teas ur pasa 0} proyo pus own yaxa Apane ji pika $5 Boen: ¢' 
9H mqqoy uo AouOUT spuəny sm ye Woy} — -Iur sy nq sərqqoy pa pue WON, Iv Zur sdosp puu Aswa æn a go SeT 
Surpuods puw ow 40 swag QO UO | 10 F uo ryunuug -iom sAofUTT “woIqqoy -Mop H sqq uuy} S Tey008: 20} eee 
is one pf Sqqoy 


ansta sry [Ië spuodg SoH MJ npn “WSLAqQoY aanov ue sy uf Apanse saedonieg Ur 5910401 auros SUH GOU 
SoIqqoy 4B SSNS PUV JUIUJAAYIV Joy NSA :UOPIVBJSHES AqqoH Joy AHA `I 
"quvyq oSpapmouy əpenbəpeur aavy nod YYA Joy ISOU} JAVI “SJJ [TF BVI OF UOJEULIOJUT JUIIYJNE JAVY 
prop ok yy “oreas oy} soquudurosoT yey Joys ayy UO res} oy} 0 spuodsaiioD yuy} UUINJoo əq3 ur Aoq Yow Jo Buy 
Toquinu əy} Adoo pus Aynyorwo y3nosy} [OS Yow peor NOK PIA) "AALP Jo SPUTy AMO} ayes UVI NO orqa UO oTEOS ¥ PUY TTL 
apeog Boney “IDIA 


O L Key of the Strong Test 243 


and the maximum score obtainable from all four traits was 


ht. ; 
s vary in the ratings they give in two ways. First, some tend 
students relatively high while others are more conservative in 
ations. Second, some raters are very discriminating in rating 
and the results they obtain vary over a large portion of the 
s are much less discriminating, and the ratings they give 


a small portion of the range. 


Fic. 2.—‘Guess Who.” 


i you will find a number of descriptions which have been listed. You will 
a list of boys from your class. We are asking you to list the three boys 
that best fit cach of the statements. You are not asked to choose only 


The boys who best fit the descriptions may be boys you do not like very 
you for your cooperation. 


whom you feel will make the most of his abilities: 


who will work the hardest to gain an education: 


who participates most in co-curricular activities: 
cess Boscenvsssesnavenesscunseceouanraetitan pia) 


on whom you would 
shed with a boy of equal size, strength, speed, and ability: 


often in a fight before he will quit: 


areossesos+eo 


-who has to be knocked down the most 


E. oeae a 


k d win in a set of tennis if the score 
who would be most apt to cómo bare a aP “Add” against him: 


him were 5-4 with the count in tl 


aa Beveanentnenennnennrsnnennnsnea 


244 Stanley R. Ostrom 


Fic. 2. Questions Used for Personal Interview. 
. Would you mind telling me what your father does for a living? 
. Do you know the highest grade (or degree) your father and your mother attained? 
. Would you give us a fairly complete picture of your work experience? 
What do you expect to do when you are through with school? 
. Would you discuss your plans for gaining the training required for that job? 
. Would you care to tell me how you got interested in. --._--.----neeeeeeeee-neee- 2 


a ogo Pe WN 


. Would you say that hobbies have played any part in determining your vocational 
goals? If so, how? 


ao 


. Do you feel that you are satisfied with your school progress? 
9. Would you say that your school work to date is a fair indication of your abilities? 
10. What would you like to be doing in ten years? 


To correct for these two difficulties the ratings of all the participating 
teachers were converted into comparable measures. 

After the ratings had all been made comparable, the average rating 

was then changed to a T-score® (2, p. 99) so it could be utilized in 
further statistical procedures. 

Tn using the “Guess Who” device the boys were asked to list three 
boys from the group in their school whom they felt best satisfied each of 
the ten descriptions. It was possible for a boy to list one of his peers on 
several questions. This happened on numerous occasions. The scores, 
which were obtained by counting the number of times each boy was 
listed, ranged from four to ninety-three. The scores thus obtained were 
also converted to T-scores. 

The results of the interviews were quantified by the following method: 
the boy’s responses. from each question were rated form one to four in 
terms of their expression of drive. A response that denoted much drive 


g, 
5Xpa = (2) x a3 [(2) ma =A Ma | . 
Where Xaa equals measurement in distribution B transformed into the terms of 
distribution A. 
Xz equals original measurement in distribution B. 
a equals standard deviation of distribution A. 
x equals standard deviation of distribution B. 
Ms equals mean of distribution B. 
i Ma equals mean of distribution A (2, p. 121). 
Since seven ratings were used, the mean score for distribution A was taken as the middle 
Score or four. The standard deviation was arbitrarily set at 1.8. 
* For purposes of this study T-score is used in the sense of Walker’s Z score, thus not 
assuming normality. 


O L Key of the Strong Test i 245 


e; a response that denoted very little drive was given a value 
he total of the ratings for all questions comprised the score 
ew. The boy with the lowest score, fourteen, thus meas- 
on drive in this measure. Table 1 shows the distribution of 
es for the 100 boys. These scores were also converted to. 


but in a reverse manner so that low scores resulted in high 


possible to correlate the results of the three original instruments 
s since as has already been stated, the scores obtained through _ 
instruments were all changed to T-scores. The correlations 
din Table 2. It is evident from the table that OL correlates _ 


n Table 1 


Distribution of Ratings Given the 100 Twelfth Grade Boys on the 
“Open End” Interview 


Number of Per Cent 
Times Rating of 
Was Used Total - 
‘1, (Highest Value) 134 14 
304 33 
: 322 35 
(Lowest Value) 169 18 
929 100 
Table 2 


ionship Between Three Variables and OL jn a High School Population * 


School] School II School III School IV Total 
N=2) (N=16)  (N=15) (N=40) (N=100) 


564.13 .39+.23 AG+.22 564.11 48,08 
54.14 60.17 24.26 282.15 414.08 
„41.16 .43+.22 37.24 88.14 412.08 


72.09 N+.13 1+.14 574.11 594.06 


.13+.13 554.20 56.11 614.06 


-144.09 
88.23 AB+.23 30.15 394.08 


essWho .57+.13 


564.13 584.19 354.25 514.12 .53+.07 


was used in the first three schools due to the 


n’s Rank Di formula 
i re ote he Total, Pearson’s Product-Moment ~ 


of students. In School IV and t 


246 Stanley R. Ostrom 


to a very significant degree with each of the three instruments and that 
it correlates to a highly significant degree with the total score which re- 
sulted when.the T-scores of each of the three instruments were added. 
It will also be noted that the magnitude of the correlations obtained in 
School IV does not vary significantly from those obtained in Schools I, 
I, and III. This tends to show that choosing three groups of high, 
average, and low OL scores, as was the case in School I, II, and III, did 
not in this instance permit spuriously high results. 

As a further check, Chi Square was used to determine the relationship 
between OL scores and the scores obtained in the Teacher’s Ratings, 
“Guess Who”, interviews, and total ratings. As can be seen in Table 3 
all four Chi Square results are of a magnitude that justify the rejection 
of the Null Hypothesis at the one percent level. 


Table 3 


Relationship Between OL and Three Variables * 
for High School Population 


Chi Confidence 
Variables, Total 88 Square Level 
Guess Who and OL 15.22 >1 
Teacher Ratings and OL 24.06 >1 
Interviews and OL 16.23 >l 
Total and OL 22.12 >l 


* The Null Hypothesis states that the three OL groups: high, average, and low, 
do not constitute different populations in terms of the “Guess Who” ratings, Teacher’s 
Ratings, interview results, and the total results obtained by adding the T-scores of the 
three variables for each boy. A chi square of 13.277 was necessary to reject the Null 
Hypothesis at the 1% level of confidence. 


Having found a relatively high relationship between OL and the in- 
struments described above, an attempt was made to determine the relation- 
ship between OL and school achievement as measured by school academic 
grade averages. 

The assumption on which the study was based was that excellence in 
school was to some extent determined by motivation or effort expended. 
To find this relationship the 200 boys from the four high schools were 
divided into two groups, the first being made up of boys with high OL 
and the second made, up of boys with low OL. With these two groups 

` the following two questions were posed: (1) do the two groups differ 
significantly in scholastic achievement? (2) if so, how much of this 
difference is due to OL? 


O L Key of the Strong Test TE: 


registered an F-ratio of 5.66 which was of a magnitude that 
dence level for the rejection of the Null Hypothesis be- 
o and one per cent levels, thus answering the first question 
affirmative. To answer the second question it was necessary 
‘or the other variable, academic aptitude as measured by the 


Table 4 


Analysis of Variance of Honor-Point Ratios 
N = 100 Twelfth Grade Boys 


Sum Test 
of of Mean of k 
Freedom Squares Square E Hypothesis** 
198 5417.12 27.35 
154. 154.88 5.66 in in 
‘ Pš doubt 
199 5572.00 


F = greater mean square/lesser mean square. By referring to Snedecor’s 
(5, 222-225), we may use the following three rules in testing the hypothesis: 
e hypothesis tested, if the calculated value of F is greater than the 1% point 
hypothesis tested, if the calculated value of F is less 


remain in doubt, if the calculated value of F 


groups, i.e., there is no signi! 
h point was 6.76 and the 5% point was 3.89.) 


Table 5 g 
and Covariance—100 Twelfth Grade Boys , 


mplete Analysis of Variance el 
: P he Effect of Academic Ability) 


(Partialling out t 


197 4118.35 22.02 


198 5417.12 18722.70 4931.12 


1 154.88 1039.68 401.28 1 14.85 14.85 .7 Accept 


199 5572.0 19762.38 5332.40 198 4133.20 


e footnotes for Table 4. 


248 Stanley R. Ostrom 


American Council on Education Psychological Examination. When the 
data were adjusted for academic aptitude by means of covariance, as 
shown in Table 5, an F-ratio of only .7 emerged. Since this was not of 
a magnitude to justify rejecting the Null Hypothesis, the answer to 
Question 2 must be that the difference in academic grade averages due 
to OL was almost negligible. 


Summary 


1. A definite relationship was demonstrated between OL on one hand, 
and out-of-school and co-curricular evidences of drive on the other hand. 
Thus it appears that boys who evidence much energy and activity in the 
less formal school situations and in everyday life situations as a rule give 
responses on the Strong Blank which result in high OL. 

2. No relationship was demonstrated between OL and high school 
academic grade averages. The reasons for this can be only conjecture 
but a few of them are ventured. It might be that high school does not 
present a challenge to most boys with the result that marks which enable 
a boy to “get by” are satisfactory. The possibility that boys satisfy their 
desires to achieve through co-curricular activities and life situations 
cannot be ignored. Furthermore, it is common knowledge that high 
school marks are not always valid. Questionable marks could easily 
cause a relationship to fail to emerge. It might be pointed out further 
that the use of the Strong Blank among high school students is question- 
able due to the immaturity of high school students. Strong has pointed 
out that interest patterns change quite extensively during the high 
school years. He states “roughly speaking, one-third of the change in 
interests is between 15.5 and 16.5 years, one-third between 16.5 and 18.6 
years, and one-third between 18.5 and 25 years (6, p. 259).” 


Received October 7, 1948. 


References 


1. Darley, J. G. Clinical aspects and interpretation of the Strong Vocational Interest 
Blank. New York: The Psychological Corporation, 1941. 

2. Guilford, J. P. Fundamental statistics in psychology and education. New York: 
McGraw-Hill Book Company, 1942. 

3, Kendall, W.E. The occupational level scale of the Strong Vocational Interest Blank 
for Men. J. appl. Psychol., 1947, 31, 283-287. 

4. Ostrom, 8. R. The OL key of the Strong Vocational Interest Blank for Men and 
scholastic success at college freshman level. J. appl. Psychol., 1949, 33, 51-54. 

5. Snedecor, G. W. Statistical methods. Ames, Iowa: Collegiate Press, Inc., 1946. 

6. Strong, E. K. Vocational interests of men and women, Stanford, California: Stan- 
ford University Press, 1943, 


———o 


An Objective Evaluation of Counseling 
Barbara A. Kirchheimer, David W. Axelrod, and 


George X. Hickerson, Jr. 
University of California Counseling Center, Berkeley 


evelopment of objective criteria for evaluating the effectiveness 
eling has traditionally been a matter of extreme difficulty. 
ntly, the first studies in the evaluation of faculty counseling are 
5 unpublished studies of Paterson and Langlie at Minnesota; and 
mon (6) at Iowa in the same year dealing with counseling by 
ally trained counselors. Lemon’s work consisted of intensive 
training for half of the lowest decile of students on the Iowa 
ng Examination. At the end of three years, Holladay (5) sum- 
Lemon’s study reported that the “counseled” group were making 
‘academic adjustment than the equally weighted group left to 
WI However, Freeman and Jones (4) in a final report of 
me group state that at the end of their college career there was 
ence between the two groups, because academic failure appeared 
‘the experimental group. 
: of the “‘spoon-feeding” type of counseling is shown in the studies 
Wley (3) with Ohio State Freshmen football players, Newland and 
y (7) with high school sophomores and Williamson (9). In William- 
tudy made on Art College students, he found as Paterson and 
lie had previously found with Engineering students, that the grade 
average of probationary students was not improved by faculty 
g. Williamson concluded that grade point average is not ade- 
a criterion of the effectiveness of counseling, or that other 

ng methods must be used than those involved in his study. ; 
years later, Williamson (10) showed significant increases 10 
int ratio for a student group counseled by trained counselors at 
ersity of Minnesota Testing Bureau when compared with a 
dnon-counseled group. In a later study Williamson and Bordin 
] ations of adjustment and cooperation, in 
In a further paper on this same study 


n devices. 


le use of subjective evalu 


antly better for a counseled group 
Paci. sine riteria are significant at the 1% 


group. Since both these © S 
wonders why it was felt necessary to g0 beyond the grade point 


249 


250 B. A. Kirchheimer, D. W. Axelrod, and G. X. Hickerson, Jr. 


average and use the subjective composite criteria. In criticizing the 
techniques for evaluating counseling, Williamson and Bordin (12) feel 
grade point average is a poor criterion because of the dissimilarity in 
pattern of subjects taken. However, the alternative of using stand- 
ardized achievement tests has limitations in comparing achievement in 
a number of areas. Moreover, the fallibility of the measuring instru- 
ment itself must be emphasized. 

Blackwell (2) in a client-centered counseling program at the Uni- 
versity of Texas reports significant increases in grade point average for 
a counseled group of 40 compared with a matched non-counseled group. 
Ward and Tyler (8) at the University of Oregon show a slightly better 
record for a counseled group than for a matched non-counseled group 
in grade point average, as well as on their special scale attempting to meas- 
ure college adjustment. Beaumont (1) in a somewhat confusing article 
purports to show that discrepancies in academic adjustment were due 
in a large measure to differences in academic counseling. However, he 
points out the fact that most “academic” counseling is more concerned 
with subjugating the individual to the academic machine than with the 
integration of the individual’s personality. 

In most academic settings, grades alone are an objective indication of 
progress or adjustment. In view of the fact that grades are the only 
specific criterion of which we are in possession, that they lend themselves 
to objective treatment, and that, with all their weaknesses, they are 
the accepted gauge of academic success or failure, the present authors 
have adopted this criterion as the most workable measure so far available 
whereby to evaluate the success of a counseling program. 

In evaluating the effect of counseling, an amplified approach might 
include considering the results upon grades of change of major course of 
study. A change of major often accompanies vocational and/or educa- 
tional counseling, and the effect of such a marked step is insufficiently 
investigated. From comparison of pre- and post-counseled grades, we 
may have some clues to the effectiveness of the change of major itself, 
and of the professional counseling which produced it. 


Selection of Groups and Methodology 


Accordingly, it was decided to study veteran students at the Uni- 
versity of California, Berkeley Campus. High admission requirements 
and fairly rigid disqualification regulations result in a rather homoge- 
_ neous, high caliber population. The average grades of undergraduate 

veteran students appear in Table 2. 

If any evaluation of counseling is to be made, the kind of counseling 

under investigation should be described.’ It is individual, consisting of 


= 7. 


An Objective Evaluation of Counseling 251 


interviews as are required to develop mutually a vocational 
lucational plan, with all needed testing individually planned, 
of the occupational library maintained by the Occupational In- 
Specialist of the Center. The psychological training and ex- 
of the Counselors and Psychometrists, and the services of a 
g Psychiatrist insure that each counselee’s total personality 
tion is considered, rather than simple vocational or educational 
Techniques are eclectic, with the constant objective of 
ting an optimal, realistic plan. A coordinate objective is the 
of the counselee so that he may carry out the plan. Counseling 
ed with the development of the individual rather than with the 
ement of grades. An important point is that the educational 
joint agreement between counselor and counselee. Counseling 
hot be superimposed but must be the result of mutual understanding 
acceptance. 
of the dual approach in determining effect on grade average 
ing, and of change of major, with and without counseling, the 
g groups were used: 
~ Counseled Change. Changed major as 4 result of a 
mutual decision of Counselee and 
Counselor. 


Non-Counseled Change. Changed major without any con- 
tact with the Counseling Center. 


' Counseled No-Change. Continued same major as a result 
of a mutual decision of Counselee 


and Counselor. 


Non-Counseled No-Change. Continued same major without any 
contact with the Counseling Center. 


inseled rae were selostaa an Ri 

ersity of California, Berkeley, opera ing uni 

Administration for advisement of veterans. All veteran students 

I received counseling under cee Law 346 (G. I 
istance for a variety of reasons. 

Berkeley; Counseling Center has unfortu- 
October, 1946, and few cases of veterans 


changed majo 


ent ence to, an one semester following, counseling. 


‘eviewed by a clerk with instruc- 
kolied as an undergraduate in the 
ing, who signified intention at the 
different Department, School, or 
‘fornia the following semester. No case 


to select every cai 
y at the time of r 
seling interview © 
ag the Universi i 
ese i ts wa: . 

was Remsiderable range in type of change. The only common change 
or ineeri ‘eal, Electrical, Civil, and Indus- 
im some form of Engineering (Mechanical, Electrical, lend E 

o Business Administration, which accoun for six of the thirty-äve 


252 B. A. Kirchheimer, D. W. Azelrod, and G. X. Hickerson, Jr. 


cases. Examples of other changes were Chemistry to Architecture, Physics to 
Social Welfare, Forestry to Agricultural Economics, Chemistry to Psychology. 

These students must have completed their counseling one full semester 
preceding the selection in order that grades following the change might be 
obtained. It had been hoped that grades two semesters before and two semes- 
ters after counseling might be obtained, but this was not possible at this time. 
Williamson’s ((11) interpretation of his data is that the effect of counseling is 
apparent in the first quarter following counseling, and no further increase in 
grades occurs in succeeding quarters. Unfortunately his hypothesis cannot. be 
explored with our groups at this time. 

The number of our Counseled Change Group, for these"reasons, was only 35, 
and for purposes of comparison other groups were constituted of equivalent 
size. Various studies have matched groups on the basis of intelligence, sex, 
age, etc, Williamson (12) points out that in such matching it is impossible to 
include such significant variables as motivation, personality, or emotional 
stability. Matching on intelligence test scores might have been desirable, but 
no such scores were available for those groups which had not gone through the 
Counseling Center. An unpublished study made at this Center by William R. 
MacKay compares the grade point average of those veteran students availing 
themselves of the Center’s facilities with the general grade point average of all 
veteran students at the University. This study showed no significant differ- 
ence in grade point average of the counselee, and also that on the ACE Psycho- 
logical Test, the average counselee score was at the 82.7 percentile (ø 20.16). 
As already pointed out, the high admission requirements and fairly rigid 
disqualification regulations result in a rather homogeneous high caliber popu- 
lation within the University. The present authors, therefore, feel that a 
random sample drawn from the University population may be assumed to be 
roughly equivalent in Lege gan and that no matching between groups was 
important other than that all subjects should be undergraduate male veteran 
students at the University of California, Berkeley. 

Only six changes of major were made by members of the Counseled Change 
Group between Fall 1946 and Spring 1947, and 29 such changes occurred 
between Spring 1947 and Fall 1947. For this reason, for the other groups the 
two semesters Spring 1947 and Fall 1947 were used for comparison, and are 
designated 1st and 2nd semester. 

he Non-Counseled Change Group was, like all other groups, collected by & 
clerk, who reviewed University alphabetical records of veterans for these two 
semesters, selecting the first 35 males who registered a change of major between 
these semesters, and who had at no time contacted the Counseling Center. 

The Counseled No-Change Group consisted of the first located 35 males in 
the Center’s files who were enrolled in the University as undergraduate students 
for the two semesters in question, and who did not in this period change their 
majors. All cases satisfying these criteria were retained. 

The Non-Counseled No-Change Group was selected in the same manner as 
Group II, except that the first 35 cases enrolled both semesters who did not 
Sere ance and who had not contacted the Counseling Center comprised 

p- 

e college year distribution of the Counseled Change Group (Group I) is 
as follows: Freshmen 6; Sophomores 17; Juniors 11; anon ihe P the year dis- 
tributions of the other 3 groups very closely approximated this, with very few 
students who were not divided between the Sophomore and Junior years. 
feat point average at the University of California is computed on the 

Three points per unit of credit for A, 
Two points per unit of credit for B, 

One point per unit of credit for C, 

No points per unit of credit for D and F. 


An Objective Evaluation of Counseling 253 


the grade points divided by the number of units for which regis- 
the grade point average (G.P.A.). heca 
dling of our data, the significance of the differences obtained was 
‘according to the formula for the critical ratio of the difference over 

or of the difference. A value of the critical ratio of 1.96 is reliable 
evel, and a value of 2.58 is reliable at the 1% level. 


Results 


de point averages for the two semesters studied for all four 
e given in Table 1. 


Table 1 
Summary of Grade Point Average and Changes 


62 1.39 .68 —.07 13 1.00 to’ AT 


d that the Counseled Change Group im- 
(1.13) to a B— average 
ich is significant at better 


m Table 1 it may be note 
om slightly better than a C average 
a gain of .52 grade points, a change w! 
1% level. ‘ 
it was felt that grade point average may be affected by elective _ 
Not pertinent to the major, & calculation was also made of only 
J >. Two cases were necessarily elimi- 
had officially made a 


transfer, they ha 
these two cases Wer g 
ing the statement of change. For the 
hi i in the major 
thirty-three cases the mean grade point average 1n l 
nly before the change was 946 (a level of deficiency) and after- 


1.68, or an increase of .734 grade point on the average. i 
students of the 35 received a grade point average of less than 
i 23 work prior to counseling, while 
student received a grade point average of less than 1.00 oe 
A number of individual examples may be cited. Onestuden' 


254 B.A. Kirchheimer, D. W. Azelrod, and G. X. Hickerson, Jr. 


who had a .77 grade point average (down grade points) with C’s and D’s, 
under a new major the following semester rated three A’s and 1 B or an 
A— average (2.75). One student receiving 2 D’s and 2 F’s improved to 
four C’s. 

Whereas the Non-Counseled Change Group had only a slightly higher 
grade point average initially than the Counseled Change Group, its in- 
crease (.24) with a change of major was less than half as great, a change 

significant only at the 9% level. 
i The change in grade point average of the Counseled No-Change Group 
(.08) and the Non-Counseled No-Change Group (— .07), with Critical 
Ratios of less than 1, were not significant changes. 

It had previously been found by the Coordinator of Veteran Affairs 
of the University of California, Berkeley Campus, that grades were 
inversely related to size of study load, i.e., number of units carried. For 
undergraduate students the averages were as follows: 


Table 2 
Grade Point Average Compared with Average Number of Units Carried 
Stud Grade Point 
Semester oad Average 
Fall, 1945 11.2 1.91 
Spring, 1946 12.5 1.57 
Fall, 1946 13.5 1.53 
Spring, 1947 14.1 1.37 
Fall, 1947 14.2 1.41 


It was felt, therefore, that such an increase as shown by Group I 
might be partially a result of a decreased study load. As can be seen in 
Table 3, the study load of the Counseled Change Group went up, and 
therefore, such explanation for their higher average must be rejected. 
The Non-Counseled Change Group on the other hand did decrease their 
study load slightly. However, both of the No-Change Groups were 
carrying a slightly heavier program in the second semester than were the 
change groups, but again the counseled group was slightly more heavily 
loaded than the non-counseled, although they decreased rather than in- 
creased their program. 

As can be seen,’ the most significant difference, much beyond the 
1% level, is between the Counseled No-Change and Non-Counseled No- 
Change Groups. The difference between the Counseled Change and Non- 
Counseled Change Groups is significant at the 7% level. 


An Objective Evaluation of Counseling 255 


s of comparison, groups were combined to increase their 

1 it is evident that both groups which did not change 

> a higher initial grade point average than the groups which 
ombining them, thus giving two groups with an N of 70 
nd that the No-Change Group (III and IV) has an initial 
‘average of 1.46 g .11, while the Change Group (II and I) has 
age of 1.15 0.11. The difference of .31 has a critical ratio 
ficant at the 1% level, indicating that the students who 
ajors, whether counseled or not, had a significantly lower grade 
initially than those who did not change. This may indicate _ 
oup whose grades were below their potentialities endeavored 
them by a change of major or by seeking counseling. 

of the No-Change Group also sought counseling. This may 
te that better grades are achieved by those in appropriate 


study. 


Table 3 
Change in Average Study Load (units) 


Ist 2nd 


Semester Semester Change 
ed 13.62 14.31 +.69 
yunseled Change 14.23 14.00 —.23 

eled No-Change 14.77 14.70 —.07 
-Counseled No-Change 14.23 14.50 +.27 
Table 4 
Critical Ratio of Differences of Changes Between Groups 


-l i I. Couns. IV; Non-Couns. 
II. Non one uba N 


5.22 
Change 1.84 be oa 
oo 5.55 


No-Change 


pining the groups accord: ing to whether counseled or not, we 
gr ing 


ounseled Group (I and II) make: 
30, o .62, while the N on-Counseled Group (II and IV) makes 


e of .09 in grade points, s -47, 8 difference of .21 grade panes 
the counseled groups. This difference has & Critical Ratio ©! 
nificant at the 2% level. 


i] 


256 B.A. Kirchheimer, D. W. Axelrod, and G. X. Hickerson, Jr. 
Discussion 

As has been mentioned, the groups in this study were necessarily 
small, and therefore the conclusions that may be drawn are limited. 
Tt is hoped that when possible this study will be repeated with a larger 
sample. Since the methodology of this study has afforded suggestive 
results it is also to be hoped that this study will be repeated with other 
populations. 

We feel that the use of the criterion of grades is warranted in view of 
their importance for the survival of the student, his future opportunities 
for professional training, or for employment. We fully recognize, how- 
ever, how few aspects of “counseling effectiveness” such a criterion may 
evaluate. It is a task of the future to develop criteria for these less 
objective areas. When this problem has been mastered, it may be found 
that additional criteria will show more clearly the value of vocational 
and educational counseling. 

It is particularly apparent in this study that most students with 
academic deficiencies eradicated these deficiencies in the semester fol- 
lowing counseling, regardless of whether they changed their major. 
These data imply the social value of counseling in the salvaging of 
deficient students. However, it is no less clear that students making 
satisfactory grades can benefit from counseling. 

From the standpoint of evaluating counseling, we cannot, of course, 
generalize beyond the particular type of counseling under study. With- 
out careful, intensive vocational and educational counseling on an indi- 
vidual basis, with concern for the individual as a whole, results may differ. 

The improvement of grades by counseled students might be attributed 
to whatever factors differentiated those students seeking counseling 
from those who do not, a possibility considered in similar studies. With 
the inclusion of a group who changed majors without counseling, we feel 
that we have effected some equalization of whatever factors may lead 
Shas a to take action of one kind or another to improve their situation. 
As shown in this study the improvement made by those who were coun- 
seled and changed major is considerably greater than that made by those 
who changed major independently. We cannot, of course, demonstrate 
conclusively that the scholastic improvement of the counseled groups 
as compared with the non-counseled groups was due to the counseling, 
since counseling itself is a complex of many variables. Such a possibility, 
must, however, be considered. The other studies, with similar coun- 
seling, have in general shown similar results. 


Summary 
1. A group of male veteran undergraduate stud: F 
j ents who changed their 
majors as a result of counseling improved their grade point average 


a 


An Objective Evaluation of Counseling i 257 


tly, despite an increase in number of units carried. Im- 
is even more marked if only major subject course grades are 


a difference in grade point average improvement between two 

of male veteran undergraduate students who did not change their 

one of which received counseling, was significantly in favor of the 
groups, at better than the 1% level. 

When non-counseled and counseled groups were compared, the 

students increased their grade point average by an amount 

an the non-counseled students with a significance at the 2% level. 


er 1, 1948. 
References 


„H. The evaluation of academic counseling. J. higher Educ., 1939, 10, 

32, 116. 

ell, Æ. B. An evaluation of the immediate effectiveness of the Testing and 
nee Bureau of the University of Texas. J. educ. Res., 1946, 40, 302, 308. 

„W.H. An experiment in freshman counseling. J. higher Educ., 1933, 4, 

5-248 

H. J., and Jones, L. Final report of the long time effect of counseling 

percentile freshmen. Sch. & Soc., 1933, 38, 382-384. 


„P. W. The long time effect of freshman counseling. Sch. & Soc., 1929, 


234-236. 

on, A. C. An experimental study of guidance and placement of freshmen in the 
st decile of the Iowa Qualifying Examination, 1925. University of Iowa 
tudies in Educ. III (1927), 8, University of Iowa. 
wland, T. E., and Ackley, W. E. An experimental study of the effect of educa- 
nal guidance on a selected group of high school sophomores. J. exp. Educ., 


6, 5, 23-25. 
J. R., and Tyler, L. E. A preliminary report of an evaluation of the Veterans 
ation counseling service in the University of Oregon. Amer. Psychol., 
2, 416. -i 
mson, E.G. The role of faculty counseling in 


sychol., 1936, 20, 314-324. ; 
l on, E. G. A summary of studies in the evaluation of guidance. Rep. 
eenth Annl. Mtg. Coll. Personnel Assn., 1938, 73-77. 
filliamson, E. G., and Bordin, E. S. Evaluating co! 


: ip experiment. Sch. & Soc., 1940, 52, 434-440. K s; 
fe G. and Bordin, E. 8- Tho evaluation of vocational and educational 


: a critique of the methodology of experiments. Educ. & Psychol. 


1941, 1, 25-34. j i : 
m on, E. G., and Bordin, E. 8. A statistical evaluation of clinical counseling. 


ve. & Psychol. Msmt., 1941, 1, 117-132. i 
nC. G. Recent research in counseling. Rep. Sixteenth Annl. Mtg. Coll. 


el Association, 1939, 88-94. 


scholastic motivation. J.appl. 


A Follow-up Study of Social Guidance at the College Level * 
Margaret Glockler Aldrich 


University of Missouri 


In 1940, the author published a research report entitled “An Explora- 
tory Study of Social Guidance at the College Level.” ! Early in 1948 
it was decided to check the available records of the girls who as college 
freshmen (1939-1940) were the subjects for the experiment. It was felt 
that eight years would be sufficient for them to have completed their 
undergraduate careers. In checking the records it was found that only 
one girl was still in residence at the University of Minnesota in 1947-1948. 
She returned to school under the G. I. Bill and her transcript looks as 
if she might soon fulfill the requirements for graduation. 

The original study was an attempt to compare two groups of freshmen 
girls who were all cases at the University Testing Bureau.’ All of the 
girls went through the usual testing and counseling procedures of the 
Bureau. The experimental group received additional guidance in the 
social adujstment area and were directed toward participation in extra- 
curricular activities. It consisted of at least one added interview with 
each girl in the experimental group stressing her social and activity life. 
In most cases this resulted in a definite contact with one or more of the 
activities in which the girl expressed an interest. The organizations had 
been contacted concerning the general need for good cooperation between 
various campus agencies. They did not know, however, that these girls 
Were in any way “special cases.” It seems safe to assume that the girls 
in the experimental group were also exposed to the usual social and extra- 
curricular program in the same way that all freshmen girls are exposed. 


* This follow-up, made while the writer served in the Student Counseling Bureau, 
Office of the Dean of Students, University of Minnesota, was made possible through the 
cooperation of many individuals and agencies. Mention should be made of the follow- 
Ing: Dr. E. G. Williamson, Dean of Students; Mr. John Foley, head of the Disciplinary 
Committee of the Office of the Dean of Students who suggested the follow-up study; 
Dr. Ralph Berdie, Director of the Student Counseling Bureau; Mr. James Borreson, 
Director of the Student Activities Bureau; and Dr. Robert Hinckley, head of the Mental 
oko! Clinic As Pa Health Service. Special thanks are due the author's 

viser, fessor d G. i 
ENS thie Aloe G. Paterson, who suggested and guided the 1940 study 
a e Educational and Psychological Measurement. Vol. II, No. 2, April, 1942, pp. 20% 


2 UTB is now called Student Counseling Bureau. 
é 258 


A Follow-up Study of Social Guidance 259 


ontrol group had no added counseling but, of course, the girls 
up were free to make use of the University social and extra- 
program. At the end of the school year both groups were 
‘on several personality scales and given a questionnaire. The 
reached at that time was: “All of these findings combine to 
e that, from this small sample, social guidance and directed partici- 
extra-curricular activities improve the ‘social adjustment’ of 
girls as measured by personality scales and a questionnaire. 
ly do the girls in the experimental group make greater mean 
but they feel that they have more friends, participate in more 
ities, and are less critical of the social program than the control 
. A treatment that makes people feel better satisfied with their 
fe is certainly worthy of further consideration.” * 
d be pointed out that the study involved a very small sample, 
imental and 28 control subjects. Also, both groups were origi- 
selected from the lower end of the distributions for freshmen girls 
“Minnesota Inventory of Social Attitudes—Forms P and B and 
activities in high school. They did not differ significantly, how- 
m the rest of the freshmen girls in mean ACE Psychological 
tion score or in mean Cooperative English Test score. The 
nental and control group were remarkably alike at the original 
‘on six objective measures (ACE, Coop. Eng., Social Beh., Social 
Rundquist-Sletto Inferiority Scale, and Bell Adjustment Inventory 
) and on high school group and individual activities. The con- 
oup was somewhat higher in high school scholarship rank. 
period of time (9 to 12 months), 


e the study covered only a brief ( 2 
worth while to re-study the groups after a period of eight years. 
t the gains revealed in 


valuation would indicate whether or noj 
ginal study were ephemeral or were permanent. i 
‘ollow-up was confined to a check of the records kept by various 


Sagencies. The following agencies were contacted: Student Coun- 
Bureau; Student Activities Bureau; Bureau of Admissions and 
Disciplinary Committee; Mental Hygiene Clinic of the Student’s 


Service; and the Alumni Association. ; 
tudy, a new card was made for each girl 


he girl belonged to the experimental or 
e sent to the agencies undesignated. This 
dings involve judgments. When 


ted for analysis. 
cit., p. 216. 


260 Margaret Glockler Aldrich 


Results 


Student Counseling Bureau Records. The Bureau records consist of a 
folder for each girl with her test results and a record dictated by the 
counselors of all counseling contacts in the Bureau. Table 1 summarizes 
the quantitative information and indicates that the experimental group 
had made, on the average, slightly more contacts over a slightly longer 
period of time.t The mean number of counseling contacts for both 
groups is considerably higher than the Bureau average of about two for 
these years. 


Table 1 
Student Counseling Bureau Contacts 
Mean | Mean No. of Mean Duration 
No. of - Contacts After of Contacts 
Group Contacts Retesting in Mos. 
Control 
N = 24* 4.58 1.33 14.1 
imental 

N =31 5.74 1.65 15.5 


* Four of the 28 girls in the control group were counseled by a counselor for the 
College of Science, Literature, and Arts. The folder of test results was kept by the 
U. T. B., but the interview records were kept in the S. L. A. office. Since these records 
are destroyed after five years, these girls had to be omitted from this part of the study. 


i Student Activities Bureau. In the years 1936-1946 the Student Acti- 
vities Bureau kept records of the extra-curricular activities of all students 
in the University. The records were tabulated each quarter by the 
Bureau staff from their membership, committee, and officer lists and 
from publicity in the college newspaper. The director of the Bureau 
feels that the records are not too accurate and err in the direction of 
omitting activities. 

The information from these cards has been summarized in Table 2 as 
mean number of activities, committees, and offices per year for the 
number of years the particular girl was in school. It must be emphasized 
that these are approximations and if anything underestimates. Never- 
theless the results indicate that the girls in the experimental group 
participated in more activities, served on more committees, and held 
more offices than those in the control group. 


‘Statistical tests of significance of differences have n i 

g ot been computed because 0 
Foma Nis and a belief that the chief value of the original study, and the present 
PET pe a fe: found in the control group method of investigating the area of 


A Follow-up Study of Social Guidance 261 


Table 2 
Student Activity Bureau Record 
Mean No. of Mean No. of Mean No. of 
Activities Committees Offices 
Per Year Per Year Per Year 
62 03 08 
Y= 30° 96 26 27 


h e were no cards in the files for two girls in the control group and one in the 
ntal group. 


Record. A second source of activity record is the yearbook 
or class, the Gopher. Each senior records his own activities 
sin college. The results from this source are very incomplete. 
ly available for the girls who actually graduated and many of 
did not have their picture and activity record included in the 


Table 3 
Activity Record from Gopher (College Yearbook) 


Mean No. of Mean No. of 
Co: 


Activities mmittees 
Listed Listed 
2.5 25 38 "| 
4.6 .60 1.00 


le 3 is based on the records of 18 girls who were included in the 

a little over 50 per cent of those who graduated). The years 
2, 43, 744, and ’46 were checked since these are the years listed 

duates on the official transcripts. This rather skimpy evidence 
ints in the direction of greater activity for the experimental group. 
e data can be compared with the Activity Bureau record, Table 
> each mean by 4 to get the mean per year. The results 


y similar as shown in Table 4. : s j ' 
e results raise an interesting question which might be investi- 
ther. There is a common idea that students tend to over- 
lication. This small sample did 


their activity record for pub! s 
E an r that there is some evidence that 


particularly if we remembel 
ivity record is an underestimate. 


262 Margaret Glockler Aldrich 


It should be added that the records of Mortar Board (Senior Women’s 
Honorary) were also checked for these years. Mortar Board picks its 
members from the entire junior class on the basis of scholarship, leader- 
ship, and service. Through the years at Minnesota this group has 
tended to include the leaders in extra-curricular activities if their grades 
were up to a certain fixed level. Two girls from this study were elected 
to the 1943 chapter of Mortar Board. They were both members of the 
experimental group. 


Table 4 
SAB Activity and Gopher Record Compared 


Mean No. of Mean No. of Mean No. of 
Activities Committees Offices 
Group Per Year Per Year Per Year 
Act. Goph. Act. Goph. Act. Goph. 
Control 62 83 03 08 08 09 
Experimental 96 1.15 26 15 27 25 


Bureau of Admissions and Records. Data from the Bureau of Ad- 
missions and Records consisted of a transcript for each girl. Table 5 
summarizes these data. 

The academic records of the two groups are similar although the 
control group had a somewhat higher average. It might be well to recall 
that they also had a slightly better high school academic record. 


Table 5 
Information from Official Transcript 
Per Cent Mean Mean No. of 
Group Graduated H.P.R.* Quarters at Minn. 

Control 

N= 15 54 1.51 8.86 
Experimental 

N = 18% 58 1.12 8.87 


* H.P.R. = honor point ratio = honor points/credits, where for each credit of A, 3, 

B, 2, C, 1, D, 0, and F, —1 honor points are given. These were calculated only from 
University of Minnesota grades and for undergraduate work. 

This figure omits two A.A. (Associate of Arts) degrees: a two year degree granted 

by the General College. There were 5 girls who did some work in General College. 


Their records are not included in Column 2 since General College grades are not directly 
comparable to grades in other colleges, 


A Follow-up Study of Social Guidance 263 


iplinary Committee Records. The list of girls was sent to the 
f the Disciplinary Committee of the university. He reported 
one of the 59 names was recorded in the files of that committee. 
Menta Hygiene Clinic. The list of names was also sent to the head 
ne Mental Hygiene Clinic in the Students’ Health Service. He had 
mes checked against the clinic records. Six girls had contacted 
nic, three from the Control Group and three from the Experi- 
. In each case he made an estimate of severity of diagnosis with 
it that those from the Control Group were labelled “severe” 
s none from the Experimental Group were so designated. 

hough the psychiatrist reports that about 5 per cent of the Uni- 
population would like to make contact with the Clinic, he esti- 
that through the years the Clinic has had facilities for only about 3 
t. This is much lower than the 10 per cent of both the experi- 
and control group who went to the Clinic. This might be ex- 
by the original selection of the groups from the lower end of 
ibutions on the personality scales. One might hypothesize that 
ial guidance did little to prevent the development of problems 
ng mental hygiene but that these problems were less severe for 
1s who had the earlier specialized help. Obviously this is little 


Table 6 
Per Cent of Married Graduates * 


10 1% 4 29% 
11 61% 7 39% 


The records of the Minnesota Alumni 
who actually graduate from 
duate there is a fairly com- 


lumni Association Records. 
a tion are kept only for the ae 
le University of Minnesota. For each gra , 
'ecord of address and married name. At the time of this study 
er record is summarized in Table 6. Clearly from these incom- 
ecords a higher percentage of the control group mered If we 
ider marriage an indication of social adjustment, the contro cere 
it least those who graduated) is better adjusted. This is the only 
AP of evidence in favor of the control group. 


264 i Margaret Glockler Aldrich 


Summary 
This follow-up of social guidance can be summarized in three sections. 


1. Those who received special guidance with social problems exceeded 
the control group in: (a) the number of contacts with the Student Coun- 
seling Bureau; (b) the mean number of college activities, committees, 
and offices; (c) the percentage graduating from the University of Minne- 
sota; and (d) a less severe diagnosis for those who contacted the Mental 
Hygiene Clinic. 

2. The groups were much alike in: (a) the mean number of months 
over which the contacts with the Student Counseling Bureau were made; 
(b) the number of quarters in residence at the University of Minnesota; 
and (c) the number of girls who contacted the Mental Hygiene Clinic. 

3. The control group was slightly higher than the experimental group 
in: (a) mean honor point ratio; and (b) the percentage of the graduates 
listed in the Alumni Bureau files as married. 


The small numbers in both groups make more detailed statistical 
analysis of questionable value. From the data available, however, there 
is an indication that the gains originally reported for the socially guided 
group continued throughout their college residence. Again, the tentative 
conclusion of the original study can be re-emphasized with the caution 
mentioned in the last sentence of that study, “the problem was, however, 
essentially an investigation of a method and as such the results should 
be emphasized only as a justification for the further use of the method.” 


Received October 22, 1948. 


Memory in Radio News Listening 


s W. Harrell, Donald E. Brown, and Wilbur Schramm 
University of Ilinois 


tions of practical importance have arisen in the field of radio 
the extent to which a listener is able to remember what he 
on a newscast. The newscaster is anxious to know how tightly 
| “pack” his newscast—how many stories he can put into a given 

ut giving his audience more than they can absorb. Beyond 
wants to know the effect on memory of repetition within the 
He is interested in what kinds of subject matter and what 
ts of those are remembered better than others. He would like 
whether his audience listens for “index words,” whether it re- 
names and details, whether it remembers items far removed in 
well as it remembers items originating nearby. Finally, he 
e to know, if possible, what kinds of items discriminate least 
good memories and poor memories, and therefore, so far as the 
memory is concerned, are mass materials for a mass medium. 
ituation wherein the average adult American listens more than 
a day to the radio, between 10 and 25 per cent of this time to 
, these questions become of social as well as professional im- 
. The study reported here was undertaken in an attempt to 
e experimental data in an area where the hunch and the thumb 


Method 

0 entirely different news broadcasts each containing 20 stories 
en by an experienced news editor. These were designated as 
s IA and IIA. These 20 stories were reduced in size but with 
taken not to omit any important detail, to permit the addition 
stories making a total of 30 stories in newscasts IB and IIB. 
sts actually ran 12}4 minutes in order to make them the same 
are most commercial casts on the radio. It was not thought 
for the purpose of this study to insert commercials.) 

he next series of newscasts which were written in a highly com- 
yle each of these 30 stories was further reduced in length which 
time for the addition of 10 new stories, making & total of 40: 


IC and IIC). 
265 


266 T. W. Harrell, D. E. Brown, and W. Schramm 


All six newscasts were transcribed. An experienced announcer read 
the casts, transcribing only two per day to avoid staleness. Each cast 
was transcribed to a platter, tape, and wire. It proved more convenient 
to use the tape transcriptions except for one presentation of a platter 
recording. 

The casts were fictitious but plausible. Real happenings were not 
presented because there would then have been some persons who were 
already more familiar than others with the event. That the casts were 
highly realistic was suggested by the questions of some subjects, who, 
even though assured of the contrary, would inquire whether “there was 
anything to” one or more of the stories. 

Memory tests were constructed for each cast. Four alternative 
multiple choice questions with single best answer were used. The 
scoring formula, Right — 14 Wrongs, was used to correct for chance 
successes. One question was asked on each story so there were 40 
questions each on casts IC and IIC, 30 on casts IB and IIB, and 20 on 
casts IA and IIA. All the questions on A casts were repeated in B casts, 
and all questions in B casts were repeated in C casts. The aim was to 
make the questions central to the story and as easy as possible while at 
the same time assuring discrimination from guessing. 

Two casts each were presented to ten groups of subjects. Each I 
cast was presented to a group which also heard a II cast of a different 
number of stories. The order of presentation was reversed from one 
session to another because of the possibility of a practice or fatigue effect. 

The method is recognized as being not true to life. In the first place 
the casts were fictitious. In the second place the subjects were as- 
sembled and had fewer distractions than do radio listeners ordinarily. 
It is expected that the experimental conditions would yield a maximum 
of what could be remembered in real life. It is believed that when 
listening to news on the radio the majority of listeners do not give as 
good attention as in the experimental setting. On the other hand, one 
could conjecture that because of the fictitious nature of the news it would 
not be attended quite so well as if it were real, so there would be some 
compensating effect to the extraordinary attention. Some thought was 
given to using real news casts and actual listeners, but the expense of 
such a study was found to be prohibitive. 

. Each group of listeners was told the purpose of the study, that there 
would be a memory test after each cast, and that there would be a pref- 
erence question after both casts. 

An effort was made to choose as subjects adults similar in education 
to the average of the American radio listening audience. The majority 
of subjects were enlisted men and women of the United States Air Force- 


7 


Memory in Radio News Listening 267 


Force enlisted man and women were members of the per- 
‘party of a base. Their average educational level was in the 

dof 10th grade. Their standard scores on the Army General 
tion Test ranged from approximately 90-115, which is practi- 
> range of the complete adult population of military age. 
subjects also included two groups of nonacademic employees 

groups of students at the University of Illinois. One of the 
s of nonacademic employees was a group of groundsmen whose 
tional level was similar to that of the Air Force subjects. The 
group of nonacademic subjects were supervisors whose educa- 
el ranged from high school graduation to college graduation. 
ent subjects were undergraduates. The subjects were some- 
e the average of the American public in education and con- 
above the average of the radio listening audience, but since 
of the subjects were within the average range in education, they 
ded as satisfactory for the experiment. 


abov 


Results: I. Memory and the Number of Stories 


onable hypothesis is that if a listener is presented a progressively 
number of items within a fixed period of time, he will remember 
essively smaller proportion of them. The results bear this out, 


le 1 shows. 


Table 1 
Memory for Broadcasts 
Tae 
w N Mar Omesn % Baars 
-20 items)* 320 54.5 1.20 21.55 10.9 
-30)* 264 49.3 1.12 18.30 ie 


í 308 45.9 1.09 18.50 
n e item had to be omitted from scoring in two sets of the tests. 


cance of these differences. 


'ab tistical signifi a 
ys these fre A Table 3, which shows positive 


her test of these figures is given in l 7 
significant correlations between each pair x w: fi Sacre ane 
that li ho were high on one test ten o ; 
ee reactors," 5 ust have been attending, and 


other. Therefore, the listeners m à 
were theasnting the same thing, whatever it was they were 


investigators, and 


268 T. W. Harrell, D. E. Brown, and W. Schramm 


It seems to be a tenable hypothesis, then, that a listener remembers 
a smaller proportion of items in a fixed-time newscast if the number of 
items is increased from 20 to 30 to 40. The question then follows: where 
is the point of insufficient return? Where does the memory curve fall 
off so sharply that the newscaster may conclude he has overpacked his 
newscast? 


Table 2 
Probability that Memory Differences May Be Due to Chance 
Differences Probability 
A and B (19-20 and 30) .0012 
B and C (29-30 and 40) -2013 
A and C (19-20 and 40) .0000 
Table 3 
Coefficients of Correlation Between Scores on Each Pair of Tests 
Tests r N 
IA-IIB -63 77 
IA-IIC Al 127 
IIA-IB 72 61 
ILA-IC 43 55 
IB-IIC 51 80 
IIB-IC 68 46 


[ea pc am a EES SD A NO 


While the differences shown in Table 1 are significant, they are 
nevertheless slight. In fact, they are so slight that a listener actually 
remembers more items from a 30-item cast than from a 20-item cast, 
from a 40 than from a 30. In a 20 item newscast 11 stories are re- 
membered, in a 30 story newscast 15 stories are remembered, and 18 
stories are remembered in a 40 story newscast. It must be concluded 
therefore, that there is nothing in this evidence, so far as the factor of 
memory goes, to lead a newscaster to set an arbitrary limit below 40 
items in a 1244 minute newscast if his material justifies that many items. 
The factor of audience preference, however, bears strongly on this point, 
as the next section of this report will show. 

For what it is worth, these figures suggest that a listener remembers 
a few minutes after a newscast, has been heard, about half the items in 
the newscast. This suggests several related questions, such as the kinds 
of material that are remembered best, the effect of repetition, and the 
kinds of cues that arouse the best learning response in news listening 


Memory in Radio News Listening 269 


These questions are discussed in sections III, IV, and V of 


Results: II. Preference and the Number of Stories 


eference question was asked at the end of each pair of casts. 
ts are shown in Table 4. 


Table 4 
Preferences for Broadcasts 
R Probability that 

N % True % 50 
49 52 

45 48 .3483 
135 74 

48 26 0000 

83 66 

43 34 0001 


figures indicate that the broadcast with 40 stories was clearly 
s than those of 20 and 30 stories. Approximately three out of 
ple preferred the 20-item casts to the 40-item casts. Almost 
wo out of three persons preferred the 30-item casts to the 40- 
The slight preference for 20 items as compared with 30 
tically insignificant. These figures suggest that the memory 
volved in listening attentively to a 40-item newscast, though 
ible for the average listener, is not popular; and this provides 
n for the newscaster to limit his number of items to 30, perhaps 


to 20. 


“Results: III. Memory and Repetition of News Facts 

ed as to repeat facts, to be tested, 
When test results on these questions 
there is no significant trend discernible, as Table 5 shows. 
basis of these results, the hypothesis can be advanced that repe- 
ficant effect on audience memory 
at uncontrolled variables entered 
may have some 


questions were so design 
n one cast than in another. 


270 T. W. Harrell, D. E. Brown, and W. Schramm 


Table 5 


Per Cent of Listeners Answering Question Correctly, Compared with 
Number of Times Answer Was Repeated in Cast 


Cast IB Cast IC 


Cast IA ( 4 
Times Rep. % Times Rep. % Times Rep. % 
3 91 3 82 2 80 
2 78 2 54 1 76 
2 60 1 54 1 59 
4 51 3 52 3 48 
2 88 AT 59 1 62 
2 84 2 64 1 53 
3 81 2 79 2 77 
3 78 2 68 1 47 
Cast ITA Cast IIB Cast IIC 
3 86 2 90 2 80 
2 85 1 61 1 64 
2 60 1 62 1 54 
3 66 1 68 1 49 
2 60 2 58 1 35 
2 56 1 57 1 55 
3 56 2 44 1 35 
3 91 1 89 1 97 


It has been shown that there is a slight tendency for memory to be better 
for any single story in a cast with 20 stories as compared to the cast with 
40 stories. Since in spite of this the statistical results show repetition 
to be of slight if any importance there is all the more reason to doubt the 
effectiveness of this kind of repetition. It must be remembered, of 
course, that this was not overt or enforced repetition; it was not done in 
the jangling fashion of “LS/MFT” or even in the style, “PI repeat that 
name again.” Those signposts may make repetition more effective in 
creating a response that leads to memory. Furthermore, it may be that 
repetition is more effective in other repetitive situations—for example, 
when a story is heard on more than one newscast. 


Results: IV. Memory and Subject Matter 


The questions were divided by subject matter, and test scores com- 
pared on that basis. This is not an artificial division, inasmuch as most 
newscasts are compartmentalized by some kind of subject matter dis- 
tinctions. The results are shown in Table 6. 


Memory in Radio News Listening 271 


use of the small number of human interest questions, the dif- 
between that score and others is not statistically significant. 

the mean per cents right on spectacular events and public 
, the difference is significant at the 5% level; between public affairs 
[name items, at the 1% level? It appears, then, that name items are 
to remember; that public affairs items are harder to remember 
stories of fires, windstorms, wrecks, murders, lynchings, and other 
ular events; and further tests may show that human interest items 

est of all to remember. 


Table 6 
Average Scores on Questions Classified by Content 


No. of 


Stories Mean % mean %o'8 
Human Interest 11 85 3.21 
Spectacular Events 46 72 2.42 
Public Affairs 36 63 2.56 
Name Items 79 53 1.94 


Results: V. Memory and Mass Audiences 


“One of the sets of test was analyzed on the basis of how well each 
ion discriminated between persons who did well on their two tests, 
1 therefore may be supposed to have good memories, and persons who 
id poorly on their two tests, and therefore may be supposed to have less 
od memories. In order to do this, each question was ranked according 
the difference between the number of participants below median and 
umber above median. When this was done, the middle half of the 
ions was discarded and attention focussed on the highest and lowest 


iles. It was assumed that the top quartile contains questions which 


clearly show the difference between the best. and the poorest 
ories; therefore, that the material being tested in this quartile is 
ial which is more difficult and less well adapted to mass audiences. 
assumed also that the lowest quartile contains questions which 
clearly show the difference between best and poorest memories; 


F k Ris ic see TEE 
terial being tested in this quartile is 
Be casted tom diences. The material in the two 


i j d of the 
d both in terms of subject matter ani 
Be ranean ss in framing that question. Table 7 


part of that analysis. 
f Spectacular events and public affairs: t = 2.51, with 80 degrees of freedom. Public 
irs and name items: t = 2.99 with 113 degrees of freedom. 


272 T. W. Harrell, D. E. Brown, and W. Schramm 


On the basis of this analysis and further examination of items and 
questions, several hypotheses may be set forth. 

For one thing, public affairs appears to be the subject matter which 
chiefly discriminates between listeners who have good memories and 
listeners who do not; whereas materials involving crime, disaster, and 
human interest are remembered almost as well by poor memories as 
by good ones. 


Table 7 


Analysis of Items Which Proved to be Most and Least Discriminatory 
Between Good and Poor Memories 


i Kind of Information 
Subject Matter Required by Question Locale Cues 
(Highest quartile—most discriminatory) 
Public affairs Details of political action Foreign Russia 
Public affairs Details of violent action Foreign Palestine 
Public affairs Names plus details of Foreign Inter-American affairs 
political action 
Public affairs Details of economic National Taxes 
5 ; policy 

Public affairs Names of political National Politics 

: classifications 
Disaster Details of accident National Airplane 
Disaster Details of accident Regional Mayor of nearby city 
Public affairs Details of political action Regional Methodist minister 
Human interest Names and cities Regional State bar association 
Human interest Name of war Regional Civil War 
(Lowest quartile—least discriminatory) 
Public affairs Details of quotation National Forrestal—Alaska- 

it Russia 
Disaster Details of cause of fire National Children—fire 
Per ie 7 Details of cause of accident Regional Old man—teacher 

affairs Details of cause of strike Regional Strike—name of 
: y nearby town 

Crime Details of violent action » Regional Negro—lynching— 

i murder 
taiea 5 Name of town Regional Escaped convict 
MUSAA AN Name of person Regional State farmers union 

int Details of prize won Regional Hollywood—Cinder- 
ella story 


Human interest Details of divorce action National Hollywood-name of 
i y well-known star 
Crime Details, nature of crime National American sailors 
assault 


A E AE OE AIN EA A AAE 


Memory in Radio News Listening 273 


a slight indication that names may discriminate more than 
t the essential difference seems rather to be the kind of detail. 
detail seems to be more discriminatory than sensational detail. 
well be that such a combination as foregin names and political 
as in one of the upper quartile questions, may put the greatest 
e to listener’s memories. 

s that events far removed in locale are more likely to dis- 
between good and poor memories than events near at hand. 
noticed that there are no foreign stories in the lowest quartile, 
t one of the stories there classified as “national” is about Holly- 
locale which mass communications have brought next door to 
ica. 
eory of radio news listening is that the listener puts into effect 
selective mechanism to parallel the newspaper reader’s use of 
or the magazine reader’s use of the table of contents. -That is, 
njectured that the radio audience listens at a rather low level of 
feness until he hears an “index” word or phrase which triggers a 
e, raises the level of attention, and causes perception to take 
With this theory in mind, it is interesting to look at the column 
sin Table VII. These are the words which seem to “stick out” 
he stories, the ones which might serve as index words or cues to 
response in case that process is the one in effect. It will be 
| that the stories in the lowest quartile have a high incidence of 
ational or familiar cues—Hollywood, Danny Kaye, children 
old age, strike, lynching, murder, escaped convict, towns nearby. 
es in the highest quartile, on the other hand, have rather more 
ated cue words—the Inter-American situation, politics, taxes, 


it would seem to be possible to 


sfar as the memory factor goes, then, 
ch to the lowest common 


e a formula for a newcaster’s approa 
minator, and therefore to a mass audience. That formula would 


the same as the one used by many newspapers which have 

mass circulations—sensation, crime, disaster, human interest; 

ffairs subordinated or treated in a sensational manner; an em- 

nearby places and familiar names; and a plentiful sprinkling 

st-attracting words, names, and phrases. i 

_ Word of Eion may be unnecessary here. Nevertheless, the in- 
wish to make it clear that they do not consider these facts to 


n for subordinating public affairs in newscasts, or for sensation- 
and infantizing all copy on the grounds that such is the least 
on denominator of the mass audience and radio is a mass medium. 
s an it is required of all news- 


no more required of & newscast th 


274 T. W. Harrell, D. E. Brown, and W. Schramm 


papers. Nor is it the import of these results. Rather, these results 
point to further study of the use of public affairs news on the air—how it 
may be made useful and effective for the part of the audience which 
needs it, without loss of either truth or dignity; the extent and connection 
to which names and details can be used when important for the audience’s 
information; and the boundaries, if any, between kinds of material which 
can best be presented to the ear or to the eye. Questions like these will 
yield to experimental approach, and radio will grow in its public service 
if the results of such experiments can be incorporated into practice. 


Summary 


1. An audience remembers a proportionately smaller percentage of 
the items in a 15-minute newscast as the number of items is increased 
from 20 to 30 to 40. This difference, however, is slight—so slight that 
actually more items are remembered from the 30-item newscast than from 
the 20, more from the 40 than from the 30. 

2. An audience has a decided preference, however, for newscast with 
20 or 30 items over one with 40 items. 

8. Repetition of facts within a newscast has not been shown to have 
4 significant effect on audience memory. 

4. Human interest and spectacular stories of crime and disaster are 
remembered better than are stories of public affairs, 

5. Insofar as the factor tested is concerned, the appeal to a mass au- 
dience by radio news is similar to the appeal of certain sensational news- 
Papers which have reached mass audiences, Results of this study indi- 
cate that human interest and spectacular events are remembered by 
the mass audience, whereas such serious subject matter as public affairs 
is remembered less well by the part of the population which is not gifted 
with good memories. Nearby events are more likely to be remembered 
by the mass audience than events of distant origin. Details and names 
do not make for mass remembrance, and details of political events and 
foreign names in a public affairs story are especially hard to remember. 

‘Index words" of a sensational or familiar nature are also helpful in 
penetrating the memories of the mass audience. 
Received October 4, 1948, 


bles for Use with the Flesch Readability Formulas 


James N. Farr and James J. Jenkins 
University of Minnesota 


emphasis is being given to measurements of the readability 
punications in many fields. A promising approach which is 
idely used and studied is that set forth by Flesch * which involves 
of syllable counts, sentence lengths, percentage of personal words 
reentage of personal sentences to yield two indexes, One index is 
g Ease” or level of difficulty and the other is “Human Interest.” 
ler to facilitate the use of these formulas, the writers have tabled 
es for them. The tables are simple to use. Table 1, “Reading 
entered vertically by average sentence length and horizontally 
‘number of syllables per one hundred words. The index figure is 
‘at the point of intersection of the row and column entries. For 
, if a sample of one hundred words contains 138 syllables and has 
age sentence length of 25 words, the “Reading Ease” index 
69. This index number may then be interpreted directly in 
of difficulty by Flesch’s table.* 
like manner the “Human Interest” table (Table 2) is entered 
cally by percentage of personal sentences and horizontally by the 


ò has thirteen personal words per one hundred words and ten per- 
‘the sentences are personal, the “Human Interest” index is equal 
). This figure may be directly interpreted in terms of interest by 
’s table.? 

veral checks were 
t edge indexes were computed separa i 
the tabled values by use of the subtractive 


made to insure the accuracy of the tables. The 
tely by the writers. One writer 
constant for columns; 


used the subtractive constant for ye Both =e 
: tive constant for diagonals. 
by use of the subtractive sophia vias 


the formulas are both straight-line functions, 
ly constructed for use in situations where only ap 


ed. 


i xf $ Brothers, 1946. 
R. Theart of plain talk. New York: Harper and $ 
sch, R. A new E yardstick. J. appl. Psychol., 1948, 32, 221-233. 


275 


proximations 


James N. Farr and James J. Jenkins 


276 


EL 


GFI SPI LPI OFT GPI PPI SPI SFT 


GL 


£4 


oz 
TZ 
BL 
e4 
FL 
SZ 
94 
LL 


84 


84 


TFI OFT GET SET 


64 


18 


zg 
8¢ 
6g 
09 
19 
£9 
+9 
s9 
99 
29 
89 
69 
OL 
TZ 
SL 
89 89 69 OL IL GL EL FL 
+L 
92 
92 
LL 
84 
62 
08 
18 
z8 
eg 
+8 
s8 
98 
28 


LET 98T GET FEI EST SET IET OFT 
Sp10M perpuny sed yunog arquas 


18 


RSSS2838R 


EJ 


18 


5222583333 


z8 


§SSSSS5Suse 


B] 
a] 
= 
a 
a 


38338235823 3 


S3833832523 se 


ae 


2283383858 332 


3u əuəzuəS əJvIəay 


AI SI 6I 6I OZ IZ ZZ & WK FH GF 9% LS SF OF OE OF IS Ze LE FE SE SE 9 LE BE GE OF IF I} SE 
SI 6I 0% OZ IZ 2 & Fe Æ W OF LG BF GF OF I£ IS SE ££ FE SE 9S OE LE BE GE OF Ih GH GH LE 
6I 0% IZ Iz Zz £ ¥ GZ 9% OZ LZ SM GS OF IS Ze Ze S€ FE GE 9S LE LE SE GE OF Ib Gh s CF 9g 
0% IZ @ z 2 F SS 9% LZ LG SZ 6 OF IE ZE EF EE FE SE OE LE BE SE GE OF IP GH EF HH WH GE 
IZ @ & &% FG GZ 9% LZ 8 6S GS OF TE ZE £8 FE FE SE 9 LE SE 6S OF OF IF GH Eb HH GH GH HE 
Š mæ mæ ue s o OF OF Te cE Ee Fe SF ot OE LE So OF OF I IY GF 8 W SF OF OY te 
eZ Fe GS OZ 9 LZ BS OZ OF TE If Ze ££ FE GE OF 98 LE SE GE OF IP BF SH SF FH 9 9 Lh Lh BE 
gE (Yes ‘cc oc ut St OE Of. te te fei eos de oe Lole se. or 0 ee ee Se 1s 
© o Ww 2 SF E OE TE ce ee Be Fe Ge oe Le BE BE GE OF I+ GF e Y I+ SF OF Ly BF Gr GF OE 
2B oc 2 sf 6% of 08 Te ce Be FE P 9 9 LE BE GE OF OF IF GH SF Hy 97 SF OF Ly BF Gr OF 19 6 
2 8 OF OF OL IE Zg ek FE SE ME 9 LE 8E GE OF IF Ih BF & FF SH OF OF Ly 8 GF 09 I9 Go 8 p 
SZ 6% 0g Ig Ig ce £g FE 98 9g 98 LE SE GE OF IF ZF Sh & FY GH OF LH LH SF GF 09 I9 9 89 LB 
6% Of Te ce Ze gg FE GE 9E Le LE SE 6 OF IF GH Lh EF FF oh OF LY BF BF GF OF I9 Z9 89 #9 9 
Of Te Ze g€ ef FE GE 9g LE SE SE 6S OF IF ZP Lh HH Fh Sh OF LH BF 6b GF 09 I9 Z9 £9 He S9 GS 
È Te ce se FE FE 9g OE -2e SE GE OF OF IF Z Eb FF Sh GF OF Le Sh GF OF OF T9 Z9 89 #9 99 OF ¥ 
«BE g8 FE GE GE 9g LE SE GE OF IF IF ZF Sh FH SF OF OF LF BF GF OF I9 Z9 Z9 eo FO Ge 99 29 g g 
S gg pe 9g 9 9g LE BE 68 OF IP ZP Sh EF Hh Gh OF Lh LP SF 6 OG IS Z9 89 89 FO 99 99 19 89 ww F 
Š e Of og LE LE SE GE OF IP Gh g EF FF OF OF LF BF SF GF OF I9 Zo Eo P9 FO S9 99 Uo 89 69 T g 
oe 9g ze 8€ SE GE OF Ib Gh & FF FF SF OF Lh Sh GF GF OS I9 Z9 £9. ¥9 99 99 99 Z9 89 69 09 © 
2 
Ss Le 8e 6g OF OF IP Gh Sh FF SF SF OF LF SF 6b 09 OS IS Z9 £9 79 99 99 99 49 89 69 09 T9 at 
N Se 6g OF I+ IF Gh g} FH Sh OF OF LF BF GF OG I9 Z9 zo eo #9 99 99 Zo 29 89 69 09 19 Z9 SI 
S 6g OF IF Zh Zb £ Fh Sh OF LF LP 8P 6b OF T9 Z9 89 £9 +9 99 99 29 89 89 69 09 T9 Z9 £9 LI 
kh OF Ip 2b &h £g} Fh Sb OF LP SF Sh GF OS I co 89 FE FE Go 99 29 89 69 69 09 T9 ZO 89 79 OT 
C Ih Zp & FF FF GF OF Lb SF GF GF OF I9 Zo £9 FO 99 Go 99 Lg 89 69 09 09 19 Z9 £9 79 S9 SI 
> Ge Sb Fr oh QF OF Lh SF GF OF OF IS Zo £9 FS GS 99 99 Lo 89 69 09 T9 T9 Z9 £9 79 99 9 FI 
5 er tr gh OF OF LF SF GF OS IG zo ze eo #9 SG Oo 29 LG 89 69 09 I9 Z9 £9 £9 P9 99 99 19 £I 
A fe OF OF LP LP SF 6b OG IS Z9 £9 £9 #9 99 99 29 89 Bo 6S 09 T9 Z9 £9 79 79 99 99 49 89 ZI 
8 oF OF LF SF SF 6P OG I9 Z9 £9 #9 Fo 99 99 Lo 89 69 69 09 19 Z9 £9 79 99 99 99 49 89 69 IT 
3 OF LF SF GF GF OS I9 Z9 £9 #9 99 S9 99 29 89 69 09 09 I9 Z9 £9 79 99 99 99 29 89 69 OL OT 
Ger ex es Tax ou x tr oe tt To” 
S GLT SLT LLT 921 GAT FLT ELT GLI TZI OLT GOT SOT 29T 99T SOT FOT GOT COT TOT OJT GST GST LIT OT QT FAT EST ZAT TAT OST 


278 James N. Farr and James J. Jenkins 


Table 2 


Flesch Human Interest Index Table 


%0 
& 
SszgIssSSRR- 


$9 he ba pa pa 
SSSeetsaaeesene 


25 


Percentage of Personal Sentences 


è 

& 

3 

8 
NSSERSSERR 


SSSRB8N88 
SSS RSSSESSBSR8 SNSSRELRRSS 
8 
G] 
è 
È 


g 

& 

8 

8 
SSSss 
SSS 


* X indicates 100 or 


g 


Percentage of Personal Words 


DOs ATB 910 
18 22 25 29 33 36 
33 37 

19 23 27 30 34 38 
35 38 

35 39 

21 25 29 32 36 39 
26 29°33 36 40 

26 30 33 37 41 

41 
42 
28 32 35 39 43 
29 32 36 40 43 
29 33 37 40 44 
30 34 37 41 45 
31 34 38 42 45 
31 35 39 42 46 


32 35 39 43 46 
82 36 40 43 47 
48 
34 37 41 45 48 
34 38 42 45 49 
35 39 42 46 50 
36 39 43 47 50 
36 40 44 47 51 
37 41 44 48 51 
38 41 45 48 52 


41 44 48 52 55 
47 51 54 58 62 


50 54 57 61 65 
53 57 60 64 68 


11 
40 
Al 
41 
42 
42 
43 
44 
44 
45 
46 
46 
47 
48 
48 
49 
49 


50 
51 
5l 
52 
53 
53 


12 


13 
47 
48 
49 
49 
50 
50 
51 
52 
52 
53 
54 
54 
55 
55 
56 
57 


57 


Inasmuch as these tables permit rapid and accurate determination of 
the Flesch index values and eliminate virtually all calculations previously 


involved, it is hoped 


of the formulas will be undertaken. 
Received February 26, 1949. 


Early publication, 


that more research on the applicability and utility 


Book Reviews 


, Earl M., and Dawson, Frances Trigg. Counseling employees. 
w York: Prentice-Hall, 1948. Pp. xi+247. $4.00. ' 
e authors state that this book is an answer to the well founded 
of employee counselors for a handbook written by practical people 
m to earth style. Many readers will disagree. 
ychologists are not apt to be favorably impressed by a twenty-five 
merit rating scale, consideration of Cardall’s Practical Judgment 
a personality test, statements such as “It is not unusual for a 
unselor to be called Mr. Anthony,” and frequent use of generalities 
ich little if any experimental evidence is cited or available. In- 
ists are not apt to agree that handicapped persons should be 
d to prevent them from developing competitive companies, and 
current job salaries are low. 
is reviewer does not believe that publication of the book will im- 


ve the theory or practice of counseling. 
C. E. Jurgensen 


apolis Gas Company 


, Henry H., M.D. Rehabilitation of the physically handicapped 
w York: Columbia University Press, 1947. Pp. 251. $3.50. 
iated with the New Jersey Rehabilitation Commission from 1919 
) i avy to continue his rehabilita- 
activities, Dr. Henry H. Kessler has had a peculiar opportunity to 
ipate in an integrated approach to the problem of seeing a person 
from illness or injury to a job—in a word, rehabilitation. Re- 
i d is a general survey of the prob- 
ices that constitute an adequate rehabili- 
ion program. The author presents his interpretation of the needs 
the disabled and the many unsolved problems in TEDT a 
i i i eld. 


ral sections. Part one describes 
d in general with special treat- 
disabled veteran and the 


he book is divided into four gene 
‘oblems of the physically handicappe 


nt of the crippled child, injured worker, | : A 
ic disabled. Social attitudes and legislation have in general crystal- 


i After a critical re- 
around these groups of handicapped persons. 
of the concept of physical fitness the author concludes that “physical 


279 


280 Book Reviews 


disability has no meaning except as it refers to what an individual does 
to solve his own problems and what private and public agencies will do 
for him in easing that burden.” Social prejudice is identified as one of 
the major problems confronting the handicapped. 

The second section contains a discussion of the services that form the 
basic structure of vocational rehabilitation, namely, physical restoration, 
vocational guidance, vocational training and selective placement. Voca- 
tional rehabilitation would be considerably enhanced if a majority of the 
medical profession were equally conversant with these fields. 

In part three Dr. Kessler describes rehabilitation in practice. Al- 
though the author’s role is that of an active orthopedic surgeon, his in- 
sight regarding the whole man is constant and he has the capacity to 
convey this perspective to the reader. This section includes discussion 
of the mentally and emotionally disabled, the orthopedic patient, the 
blind and the deaf and the medical and surgical invalids. 

The final section includes a cursory review of the legislative and 
administrative organization of a few of the existing programs for the 
handicapped. The final chapter contains the author’s remedies for the 
problems pointed up so clearly throughout the book. Inadequate re- 

_habilitation is due primarily to “the lack of public and professional 
knowledge of their possibilities (handicapped) and because of the igno- 
tance of facilities that are already available to them.” His major pro- 

_ posal is a uniform, compulsory, lifetime health record in the hands of 
state departments of health which would require annual reports from the 
individual or his physician and would urge him “to have his defects 
corrected by his private physician or by public facilities.” Disability 
pensions are advocated for those who cannot be rehabilitated. 


By ois NG Donald H. Dabelstein 
Washington, D. C. 


Yoder, Dale. Personnel management and industrial relations. (3d ed.) 

New York: Prentice-Hall, Inc., 1948. Pp. xi+894. $5.00. 

For readers familiar with the two previous editions it is sufficient to say 
that this latest edition maintains the same high quality and thoroughness 
but is larger, longer, and brought up to date. Addition of materials and 
developments from the war and post-war period has expanded the dis- 
cussions on nearly all topics and particularly on selection, wage problems, 
stabilization of employment, personnel records, and the legal aspects of 
collective bargaining. When an established authority in the field sees 
fit to expand his treatment of certain areas it can be used as a rough 


indication of how the field itself is developing and what problems are 
receiving major attention. 


Book Reviews 281 


major characteristics of this edition are the same as those of the 
editions, viz., recognition of the importance of manpower in 
istrial system, growth of personnel management as a profession, 
t of the historical development as well as the present status 
ersonnel function being discussed, consideration of personnel 
nent problems from many aspects (economic, psychological, 
cal, legal), inclusion of a chapter on statistics and reference. to 
riate statistical procedures for each topic, emphasis upon research 
int and methods, thought provoking exercises and review ques- 

the end of each chapter, and a thorough, wide coverage of the 
in the field. The literature coverage point should be em- 
since, in addition to footnote references on nearly every page 
collateral readings at the end of each chapter, there is a list of 
ch agencies, 56 journals, and 6 reporting services. The refer- 
are quite up to date, nearly all from 1940 on and extending into 


logical viewpoint and findings are permeating the field of manage- 
On the other hand, he doesn’t want his findings so thoroughly 
| as to eliminate the need for his courses on personnel and industrial 
This tendency puts the author of a book on personnel 
ntin a related dilemma. If he doesn’t give the psychologist his 
\e is criticized; if he includes too much, the psychologist may say, 
‘ow’re in my bailiwick.” 
er has handled this ticklish situation rather well. Of all texts 
sonnel management with which this reviewer is familiar, Yoder’s 


learly reveals the impact of psychological findings upon manage- 
p e areas of selection, 


inciples and procedures, particularly in thi reas 0 t 
, morale and incentives. The importance of individual differ- 


interpersonal relationships, and social psychology is stressed in 
issions. Many of his supporting references are from psychological 
L There is still need, however, for a complementary study of 


chological techniques per se and for a thorough treatment of 
chological studies merely referred to in the text. Yoder’s text 
Il serve both to arouse management’s interest in industrial psy- 
and to help industrial psychologists understand how their contri- 
i tical situation. j 
E o ens could be made, such as the rather poor selection 
ability tests listed as representative of the field (225 n), giving 
le of Shartle’s book as “Job Analysis” instead of Occupational 
on” (121 n), and stating that the G. I. Bill provides a maximum 


282 Book Reviews 


of three instead of four years of training (253 n). The only major 
weakness apparent to this reviewer was the rather casual treatment of 
supervision. 

$ Albert S. Thompson 
Vanderbilt University 


Pigors, Paul, and Myers, Charles A. Personnel administration: a point 
of view and a method. New York: McGraw-Hill Book Co., Inc., 1947. 
Pp. ix+553. $4.50. 

This exposition of personnel administration is well organized. Sec- 
tion A (3 chapters) presents the broad function of the personnel adminis- 
trator, his place in management and the ‘‘personnel point of view,” based 
on a recognition of the worker’s need for both personal development 
and social relationships. Section B (3 chapters) presents a method of 
understanding and solving personnel problems, involving systematic 
consideration of four elements in the situation: (1) technical features, 
(2) the human element, (8) principles and policies, (4) the time factor. 
The use of this “method of situational thinking” by the personnel admin- 
istrator as a staff officer is described in some detail and the integration 
of both “person-centered” and “policy-centered” approaches is empha- 
sized. A separate chapter describes the interview as a basic tool in 
investigation. 5 

Since the personnel point of view stresses individual worker and 
work-team adjustment and efficiency, Section C (3 chapters) discusses 
the personnel administrator’s function in diagnosing organizational sta- 
bility through studying employee morale. Indices discussed are pro- 
duction, absenteeism, accidents, turnover, and complaints and grievances. 
The remaining sections in Part I apply the personnel point of view and 
method of approach to the standard problems in personnel administra- 
tion. Twelve chapters deal successively with selection, training, em- 
ployee rating, transfer and promotion, discipline, wages, hours, employee 
Services, ete. Chapter 22 summarizes the personnel point of view. 

The last third of the book (Part II) presents Case Illustrations sup- 
plementing the chapter discussions in Part I. Nineteen cases, ranging 
from 3 to 16 pages each, are given in considerable detail, including back- 
ground, interview or descriptive data, and interpolated discussion ques- 
tions and interpretation. Appendices include brief descriptions of 
the Western Electric Research Program and the Job Relations Training 
Program of the TWI and a summary of an Employee-Service Program. 
A Selected References section listing nearly 600 references grouped ac- 
cording to the chapters in Part I, an Index of Names referred to in Part 
I (but not in the Selected References), and a fairly detailed Subject 
Index conclude the volume. 


Book Reviews 283 


book represents a major contribution to the field and profession 
onnel administration. The authors have been able to formulate 
estingly and clearly the basic philosophy of personnel work and to 
its significance in modern industrial society. It should result in 
ne management seeing its problems in a new light and, if studied 
nners, will help create a new generation of personnel administrators 
to their responsibilities. The presentation is particularly strong 
exposition of the staff function of personnel administration, in its 
to the investigation of personnel problems, in its recognition of the 
ationships between technical and human problems and between 
m-centered and policy-centered considerations, and in the need for 
tant appreciation of employee attitudes as a factor in the situation, 
crucial position of the supervisor in labor-management relations is 
t and the effect of unionization on personnel practices is evident in 
discussions. 
he difficulty in evaluating this volume is that it differs from most 
on personnel administration. The sub-title “A Point of View and 
od” describes it nicely for that is just what it does, i.e., it proposes 
expounds a frame of reference and a method of approach to the 
standing and solution of personnel problems in industry. But, by 
nature, it stops there. Although if is an excellent How to Go 
t It manual, it is rather weak on What Has Been Done or on How 
Do It. There is little attempt to survey the “facts,” the “pro- 
» and the “program” in the standard topics of fatigue, rest 
, job analysis, job evaluation, labor force characteristics, measure- 
t of employee attitudes, labor laws, employee counseling, personnel 
d keeping, etc. The Selected References are probably intended to 
[the student where to go for this type of information but, if 80, the 
is weak in spots, particularly with respect to the contributions of 
ia The references to psychological literature are 
the AMA publications; only a 


s, is equally important, and, in Fe 
ound data upon which the method of 


284 Book Reviews 


In brief, Pigors and Myers have presented an excellent statement of 
the personnel point of view and a useful guide for applying its funda- 
mental principles to everyday problems in personnel work. To obtain 
a well-rounded background for personnel administrators, the student 
will also need a thorough grounding in research procedures and an ex- 
tensive factual survey of present knowledge in the field, particularly as 
revealed in psychological research. 


i Albert 8. Thompson 
Vanderbilt University 


Doob, L. W., Public opinion and propaganda. New York: Henry Holt 
and Co., 1948, pp. vii-600, $3.75. 


To prevent possible disappointment any prospective reader should 

Doob’s objective. It was not his purpose to review and 

evaluate the relatively quantitative studies that have been conducted. 

‘The principal purpose appears to be an attempt to explain public opinion 

and propaganda in terms of selected principles of human behavior. 
Obviously this is quite an undertaking. 

In line with this objective, the first group of chapters presents a back- 
ground and explains such concepts as consistency, rationalization, dis- 
placement, compensation, projection, identification, conformity, and 
simplification. 

This is followed by a short outline of “principles of public opinion.” 
The qualifications which must be attached to the set of principles are 
stated honestly: the concepts which form the basis of the principles are 
merely and as such are descriptive only; in view of the 


this set of principles has been drawn, it is premature to propose any 


to caution as forcefully as possible against premature generalizations 
and glibness” (p, 89), ; p 
Analyzing results obtained from studies of public opinion was not & 
principal objective. In fact the “exotic or mundane results obtained 
gr measuring public opinion” are used only incidentally “to indicate 
difficulties and techinques of measurement” (p. iii). 
ali ' second group of chapters places emphasis on 
of rte of conducting public opinion studies rather than on an analysis 
results obtained. This naturally leads to the consideration of 
such problems as the nature of the sample; the method of specific assign- 
ment (area-type probability sampling) vs. quota sampling; size of sample} 


Book Reviews 285 


ng problems; the technique of questioning; reliability; evalua- 
public opinion polls; and such intensive measures of public opinion 
el studies, open-ended interviews, attitude scales, and prolonged 
ve) interviewing. 
ese chapters evidently represent an attempt to explain the tech- 
of sampling and measuring public opinion in such simple terms 
at any one can understand them. The reader who feels the urge to 
point out what might appear to be inadequate treatment of these subjects 
‘remind himself of the difficulty of reducing the explanation to 
e simplest possible terms. For example, some readers might object 
‘Buch statements as “public opinion polls usually draw their samples 
npletely at random but at random from within specified strata of 
population which have been determined on the basis of attributes 
d to the particular poll in question” (p. 119), Any one familiar 
e way that the most popular polls actually have been conducted 
it very well question the statement that the selection within strata is 
random. However, the point is that the contribution provided 
ducing the explanation to a very elementary level probably more 
justifies what some readers might regard as inadequate treatment, 
here is one point, however, for which the reviewer cannot find a 
fitimate excuse. In effect, Doob accuses both Gallup and the Psycho- 
ical Corporation of wording questions to get results which will please 
clients (p. 157). Such a statement is s0 farfetched that it suggests 
f close practical touch with the ways in which such organizations 


is is merely one example which contributes to the impression that 
ittempting to cover the numerous specialized fields which make up 
General field of public opinion and propaganda, Doob has been forced 

ly on reading widely scattered sources rather than depending upon 
experience in each of the fields. His treatment of the field of 
ing provides a good example. Obviously his practical experience 
this field has been very limited, and his contacts with what has been 
fe in advertising research evidently have not been very close. Yet 
not hesitate to make such statements as: “They (the radio in- 
y) finance polls which purport to show by means of somewhat 
questions that people really like to listen to advertising’ (p. 491) 
erence to the Field-Lazarsfeld study. The “somewhat binsed 
” accusation is neither explained nor supported by any evidence. 
book covers a wide variety of topics in addition to the ones 
ly mentioned, including: the importance of public opinion; ted 
of propaganda; such concepts as stimulus intensity, percepi ; 
, perceptual variation, stimulus simplification, reinforcement, 


286 Book Reviews 


drive reduction, and primacy; the media used for propaganda purposes; 
and a final summary on the value of analysis, including an outline to serve 
as a guide in collecting the information needed for a relatively adequate 
analysis. 

What the reader can and cannot learn from the discussion of these 
topics has been suggested previously. In general, the less the reader 
knows, the more he will get out of this book. The beginning student 
will get a relatively quick survey of a wide variety of topics, and the 
reader who is highly specialized in one field will get at least a surface 
understanding of the other fields. However, any one with a fairly good 
grasp of the whole field will find little of interest beyond a few of Doob’s 
personal opinions, and he is likely to feel that the material is fairly thin. 

None of these statements is intended as a criticism of the way Doob 
has approached the difficult problem of covering the whole field of public 
opinion and propaganda in a single book. To attempt to cover every- 
thing from the mechanics of polling to philosophical considerations, with 
a set of principles included, is a very difficult task. The field is made 
up mainly of specialists, with each group working in its own specific field 
and using methods on various levels of accuracy. Coordination is 
needed. Doob has made a pioneering effort to draw together the 
scattered threads. For this reason, his book may interest many readers. 

Alfred C. Welch 

Knoz Reeves Advertising, Inc. 

Minneapolis, Minn. 


Rudolph, Harold J., Attention and interest factors in advertising. New 

York: Funk and Wagnalls, 1947. Pp 119. $7.50. 

For many years Daniel Starch and staff have compiled magazine 
readership ratings using the recognition method. The question fre- 
quently has been asked, just what do the Starch studies prove? Mr. 
Rudolph attempts to answer this question, at least in part, and in so doing 
has presented research findings which throw considerable light on the 
relative value of a number of present-day advertising techniques. The 
author states, “the objective of this book was to set forth the elements 
which contribute to the attention and interest of magazine advertisements 
and to determine, as far as possible, the extent of each separate influence.” 

Consumers’ reactions to 2,500 different half and full page advertise- 
ments appearing in The Saturday Evening Post between the years 1935 
through 1939 make up the original data for the studies reported. In 
analyzing these data Mr. Rudolph exhibits an unusual ability to isolate 
and control a surprising number of factors in advertising. If the book 
did nothing more than show how such extremely complex data can be 


í 
i 


Book Reviews 287 


under scientific control it would have served a worthwhile 
; but it does more than that. It produces answers to more than 
advertising problems, such as relative value of half and full page 
best “spot” for a headline, sex preferences for various types of 
tions, and the like. 
certain places one wishes he had isolated a few more factors. For 
, he shows the effect of “feeling tone” on attention value in con- 
ional type advertisements but does not show its influence on reader- 
p, where one might expect its greatest contribution In other places 
fr. Rudolph’s genius for isolation of individual factors is inadequate 
‘considerable number of influences operate in an unknown manner. 
ample of this is his analysis of the problem of “static” vs. “action” 
All “static pictures” are lumped together to compare with a 
lumping of all “action pictures” which leaves interest, artistic 
pictorial techniques, and a host of other factors unanalyzed. One 
assume that these unanalyzed factors are equally distributed in 
types of pictures, which is really a large assumption. 
statistically trained research worker reading this book will find 
ek of “N’s” and measures of significance of differences a serious 
teoming. The following apology is given by the author: “‘unfortu- 
, most of the records pertaining to this investigation were destroyed 
the company (J. Stirling Getchel, Inc.) went out of business. For 
he number of advertisements in- 


reason it is not possible to show t 
ved in each separate comparison.” Tt is unfortunate such valuable 


a were destroyed. } 
While the author emphasizes the specificity of the problems dealt with, 
y be well to stress further the fact that the techniques investigated 
the mechanical aspects of advertising. If 
ake a successful advertisement, 
t contribution to advertising. 
C. Link, and others who hold 
decidedly secondary to the 
human motives, then the 


cerned primarily with 
believes mechanical perfection will m 
ñ this book is an extremely importan 
ver, one follows Kenneth Goode, H. 
t mechanical factors, while important, are 
er’s ability to tap deep undercurrents ke 5 
n i i ittled down considerably. 

ce of this book is whittle SR Pada 
University of Minnesota 


an, Benjamin M. Labor relations and human Ce 
ge, Massachusetts: McGraw-Hill Book Company, 1947, pP. 


+ 225. ‘ 
nother appropriate title for Selekman’s book would be A Psychology 
Labor Relations and it is indeed a reflection upon psychologists as & 


288 Book Reviews 


group that one of them has not come forward to write a volume on this 
subject. Granted that a sizable body of empirical facts has not been 
developed in this area, it is heartening to see someone strike out and 
prepare an exploration of the human relationships involved in negotiating 
and living under a union agreement. While other authors have discussed 
the subject, this organized treatment is long overdue. 

Selekman paints the current picture of strife in the world of union 
relations and raises the questions of “why” and “what can be done about 
it?” “How can we achieve in daily shop behavior the cooperation nec- 
essary for realizing both full production and maximum human satisfac- 
tion?” His answer traces the emotional reactions of both union and 
Management men from the time a union enters the industrial scene as an 
organizing unit, through the negotiation of the first agreement, to the 
problems of administering and modifying the agreement. His plea is 
for greater common understanding of the person across the bargaining 
table as a human being with foibles and feelings, motivations and frustra- 
tions. He insists that “the capacity for conflict and cooperation lies deep 
in the human endowment.” Conflict is today’s pattern because modern 
industrial organization and local shop practices tend to hide the realities 
of interdependence which usually build spontaneous cooperation. ‘The 
discovery of methods for imparting to each man at work the feeling that 
he is an indispensable part of the whole working group thus figures as a 
major problem for research and experiment.” Selekman faces squarely 
the very real bases for disagreement and suggests means of meeting the 
problems thus arising. It is interesting to relate his proposed practices 
with those found by the National Planning Association in its series, Causes 
of Industrial Peace. 

The psychologist will find many challenging questions; the last 
chapter, “Conflict and Cooperation,” presents several hypotheses which 
could serve as effective foundations for research. While he may not 
agree entirely with all of the interpretations, he will be stimulated to do 
some thinking in a much-neglected (by him) field. The book could be 
useful to the industrial psychologist for distribution to his friends in labor 
relations and in unions; they may find it a trifle hard to read but it will be 
well worth the effort. It is a book that the non-industrial psychologist 
will find valuable in understanding the potential role of the psychologist 
in union relations. 


Personnel Di Brent Baxter 


lepartment 
The Chesapeake and Ohi 
Cleveland, Ohio io Railway Company 


\ 


f New Books, Monographs, and Pamphlets 
monographs, and pamphlets for listing and possible review should be sent to 


Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


nd your mental abilities. Lorraine Bouthilet and Katharine Mann 
Chicago: Science Research Associates, 1948. Pp. 48. $.75 
copy. $.60 for fifteen or more. $.40 for one hundred or more. 
sychology of social classes. Richard Centers. Princeton: Princeton 
niversity Press, 1949. Pp. 256. $3.50. 
sof ambiguity. Simone De Beauvoir. New York: Philosophical 
ary, Inc., 1948. Pp. 163. $3.00. 
eople know best. Morris Ernst and David Loth. Washington, 
|.: Public Affairs Press, 1949. Pp. 169. $2.50. ! 
logy of invention in the mathematical field. Revised edition. 
eques Hadamard. Princeton: Princeton University Press, 1949. 
145. $2.50. 
’s youth. A. B. Hollingshead. New York: John Wiley and 
, Inc., 1949. Pp. 420. $3.50. f 
ce guide to basic management training. Arthur $. Hotchkiss. 
River, Conn.: National Foremen’s Institute, Inc., 1949. Pp. 
$5.50. 
tanding yourself. William C. Menninger. Chicago: Science Re- 
arch Associates, 1948. Pp. 52. $.75 single copy. $.60 for fifteen 
‘more. $.40 for one hundred or more. 
rical introduction to modern psychology- 
Murphy. New York: Harcourt, Brace and Co., I 
xtbook $4.50, Trade $6.00. : 
computation of elementary statistics. Katharine Pease. New 
rk: Chartwell House, Inc., 1949. Pp. 238. $2.75. i i 
s. Public opinion, politics and democratic leadership. Lindsay 
New York: Alfred A. Knopf Company, 1949. Pp. 239. 
‘75. r» 
‘and medicine. Dorothy M. Schullian and Max Schoen, Editors. 
York: Henry Schuman, Inc., 1948. Pp. 499. $6.50. 
al abilities in the adolescent period. David Segel. Bulletin 1948, 
6, Federal Security Agency. Washington, D. C.: Supama 
Documents, U. S. Government Printing Office, 1948. Pp. 4. 


Revised edition. Gardner 
nc., 1949. Pp. 466. 


289 


290, New Books, Monographs, and Pamphlets 


How personalities grow. Helen Shacter. Bloomington: McKnight and 
McKnight, 1949. Pp. 256. $3.00. 

Human relationships in public health. Geddes Smith. New York: THE” 
Commonwealth Fund, 1949. Pp. 18. $.15. 

The American soldier: Vol. 1. Adjustment during army life: Vol. 2. Combat 
and its aftermath. §. A, Stouffer et al. Princeton: Princeton Uni- 
versity Press, 1949. Pp. 600 each. Vol. 1 and 2, $13.50. Separate, 
$7.50. 

Appraisal of vocational fitness by means of psychological tests. Donald E. 
Super. New York: Harper and Brothers, 1949. Pp. 780. $6.00. 
Dynamic psychology. Percival M. Symonds. New York: Appleton- 

Century-Crofts, Inc., 1949. Pp. 413. $3.75. 

Personnel selection. Test and measurement techniques. Robert L. Thorn- 
dike, New York: John Wiley and Sons, Inc., 1949. Pp. 366. $4.00. 

Perspectives in medicine. New York Academy of Medicine. New York: 
Columbia University Press, 1949. Pp. 163. $2.50. 

Research frontiers in human relations. Vol. 92, No. 5 of Proceedings of 
the American Philosophical Society. Philadelphia: American Philo- 
sophical Society, 1948. Pp. 86. $1.00. 

How to prepare an employee’s handbook. Deep River, Conn.: National 
Foremen’s Institute, Inc., 1949. Pp. over 300. $12.50. 

The new cure for white collar unrest. New York: Prentice-Hall, Inc., 
1948. Pp. 47. $1.00. 


F ¢ 
ournal of Applied Psychology 


}, No. 4 ‘August, 1949 


A Vocational Interest Test at the Skilled Trades Level * 


K. Kenneth E. Clark 
University of Minnesota 


counseling of college students who plan to enter one of the pro- 
fields has been greatly aided by the development of the Strong 
al Interest Blank and the Kuder Preference Record. Widespread 
these devices not only with college students but with high school 
s, job applicants, the unemployed, and other groups has demon- 
‘the usefulness of a measure of an individual's interests in com- 
on with those of successful workers in a given occupation, One of 
ous limitations of these instruments, however, is the inadequate 
i of occupations at the skilled and semi-skilled levels. ‘Thus, 
ong Vocational Interest Blank can be scored only for carpenter, 
m, and policeman at these occupational levels, As a result, the 
counselor is much better prepared to counsel the small minority 
ial professional, semi-professional and technical workers than to 
the large majority of persons planning to enter skilled, semi- 
and unskilled occupations—at least as far as the measurement 
t patterns is concerned. 
g World War II, the armed services placed great emphasis on 
urement of aptitudes; little was placed on the m 
sts. It frequently happened that, when highly capable men 
ent to technical schools for training, school officials would often 
plain that the students were not “interested.” That it would have 


is research was carried out under Contract N6ori-212, T. O. III, between the 


of Naval Research and the University of Minnesota. This paper le on 
that "The writor wishes to acknowledge the work of Mr. 


4 ici. "and Mr. Robert I, Hudson in the eolleetion 
8. rt Mrs. Patricia Hayes, a Mr z = 
valuable both in the planning of various 


10; Milk Driver Employees, No. 546; Painters, No. be ag ee Cement 
rs, No. 20; Plumbers, No. 34; Sh o. 16; 
0. 30; and Steam Fitters-Pipe Fitters, No. 455. 


E 291 


292 Kenneth E. Clark 


been desirable to pay more attention to measured interests of individuals 
was generally recognized. To actually do so, in practice, was rather 
difficult. For one thing, military terminology is strange to the newly 
inducted recruit. To ask for statements of job preferences in terms of 
job titles is therefore likely to be futile. To ask for a statement of 
preferences in terms of definite types of activities is also likely to obtain 
information of doubtful value either from a civilian or a military re- 
spondent. Even were such an approach considered desirable, it is 
likely that the high level of affect among recruits would lead them to 
state preferences in terms of assignments which either keep them closer 
to home, keep them in the continental United States longer, or either 
reduce or increase their likelihood of being assigned to combat duty. The 
use of a questionnaire which could be scored to indicate the interests of an 
individual in terms of the known interest patterns of members of military 
occupational groups was not possible because such an instrument did not 
exist. It is the purpose of the present investigation to explore the 
possibilities of developing such an interest measure, usable for potential 
workers both in the occupations of enlisted men in the armed services, 
and in the corresponding civilian occupations. 


The Questionnaire 


To provide the information on preferences needed for the analysis of 
interest patterns, a 570-item questionnaire, the Minnesota Vocational In- 
terest Inventory, was prepared. Items in the questionnaire were grouped 
in three’s, making up a total of 190 triads. The individual respondent 
is asked to select from each triad of items the one activity he would like 
most, and the one he would like least, leaving the third item blank. The 
approach used is thus a forced-choice, with the respondent who follows 

ections being required to make a total of 380 choices, half of them 
“like” and half of them “dislike.” 

The items used in the inventory were selected from a variety of 
sources. First, a large number of items were written which described 
jobs or tasks making up part of a job.2 The Dictionary of Occupational 
Titles, the Manual of Navy Job Classifications (Nav Pers 15105)* and 
similar materials were scrutinized for suggestions for items. The final 
list of items used contains such activities as the following, grouped in 
three’s as shown, with the directions for marking responses as indicated: 
is aoe eee to acknowledge the able assistance of Josephine Welch in 

3 Dicti o 7 i i, i i n: 
e Etnea e oaa pes Eee 1, Definitions of Titles. Washingto 


‘Manual of Enlisted Navy Job Classificati Washington: 
Bureau of Naval Personnel, 1945. ao 


Vocational Interest Test at the Skilled Trade Level 293 


$ Directions 

the following pages you will find many activities listed. They are arranged in 
cs of three. You must make a choice in each block of the one thing you like to 

most, and the one thing you like to do least. 


k the thing you like to do most with a plus-sign (+). 
k the thing you like to do least with a minus-sign (—). 
at leaves one of three items blank. 


_ Example: ( ) a. Write letters. 
(+) b. Fix a leaky faucet. 

(—) c. Interview someone for a newspaper story. 
w turn the page and begin. Be sure to fill out all pages. 


s are grouped in three’s in a haphazard fashion. Thus, no a priori 
i of scoring played any role in determining how items were combined to 
triads. As a result, the blocks of three look like the following examples: 


a grocer. a. Varnish a floor. 

a printer. b. Learn to use a slide rule, 

a shop foreman. c. Repair a broken connection on an 
electric iron. 


Ine a piano, a. Putter around in a garden. 
jok a meal. b. Take part in an amateur contest. 
ange a tire on an automobile. c. Cook spaghetti. 


high 
: b. File cards in alpha- 


ste of this difficulty, the writer believes 
me rape obvious type of choice which is 


onse, were made 
lence, and on the basis 


lective appraisal of the types of items which would work best in the situa- 


18 Where it is expected the inventory will be used. 


The Criterion Groups 
ed that the questionnaire be 


294 Kenneth E. Clark 


tions. It was not considered desirable to prepare keys solely in terms of 
rubrics identified by any other means. 


The first contacts with employed groups were made through various busi- 
ness and industrial organizations in Minneapolis and St. Paul, Minnesota.’ 
Willingness to cooperate in the Lay ge was expressed by personnel managers 
of many of these concerns, with the reservation that the matter should be 
cleared with union representatives before any action was taken. Furthermore, 
some reluctance to use company time for the collection of data was expressed, 
along with assurances that this, indeed, was a worthy project. 

nion representatives were, accordingly, contacted, and the possibilities 
of the program described to them. Union leaders were, on the whole, willing 
and anxious to cooperate in any program which might eventually operate to 
increase their own effectiveness in selecting apprentices for training in their 
own trades, With only one exception, union groups who were contacted agreed 
to aid in the assay of interest patterns of their own memberships. 

How to obtain the responses of the membership still remained a major 
problem. ‘The first attempt was attendance at union meetings. The program 
of research was presented to the membership with a request for their coopera- 
tion in responding to the questionnaire during the meeting itself. Tnasmuch 
as completing the questionnaire required from 45 minutes to over an hour, this 
effort proved to be futile. Somehow or other union meetings were not con- 
sidered by the membership a suitable time for this sort of work, and as a result, 
many questionnaires were begun, but few were completed. 

A second effort was made at the place of employment. A representative 
of the project, accompanied by a union official, would make the rounds of 
places of business, would describe the research program to the worker, who 
would then fill out the eerie while our representative and the union 

resentative waited. This method was most effective, although unpopular 
with the employer, and excessively time-consuming, 

A third may was by use of the mails. Through the trust and coopera- 
tion of the union leaders, it was possible to use their mailing lists, and to send 
questionnaires to the membership, with a covering letter signed by the business 
ae , or another official of the union. These letters asked for the cooperation 

the membership, gave a brief word about the objectives of the program, and 
gave the endorsement of the local union officials to the project. A stamped 
envelope was enclosed, addressed to the union office. Returns were anony- 
mous, with questionnaires coded in such a way as to permit follow-ups only 
to those who had not yet returned their questionnaires. 
pS Of the various methods tried for obtaining adequate coverage of workers 

a given occupational group, the mail questionnaire method proved to be 
most effective. Thus, while 3500 questionnaires were distributed in union 
meetings, with the membership voting the program “hearty support,” only 
129 usable returns were obtained. Mail questionnaires to 320 electricians 
yielded, with one follow-up letter, a return of 201 questionnaires—a 63 per cent 
fon of which most were usable. For the A. F. of L. trade unions used s0 

ar, questionnaires have been mailed to the entire male membership. Figures 
on paar returned and usable are shown in Table 1. 

tte sampling of occupational groups for the purpose of describing interest 

PED AAI frek iP on en be representative of the entire eau 
s i i roups 
Several factors operate to bine bee ick > extent is this true of our group! 


* The writer wishes to acknowledge the indi 2 A 
in the collection of data from AA Saik oo of Mr. Herbert S. Klapp 


Vocational Interest Test at the Skilled Trade Level 295 


Geographically, the samples are highly restricted. Only if the St. Paul 
tradesmen are strictly representative of the skilled tradesmen in the 
occupations all over the country will this bias be eliminated. It is in- 
that this geographic bias be reduced in later samples, not only by 
returns from Minneapolis workers, but by obtaining samples in other 


ies. 
Only skilled tradesmen who are members of St. Paul locals of the 
erican Federation of Labor are included in the samples. This, obviously, 
6a serious source of bias, and one which will need to be corrected. This study 
restricted to the A. F. of L. unions partly as a matter of expediency, and in 
tt because these unions are trade unions, not industrial unions. Working 
mited funds, this study could not even exploit all of the data-collecting 
pportunities provided by the St. Paul Trades and Labor Assembly of the 
F. of L., and so there was little point in diversifying the required contacts. 
8. All members of a particular union local did not return their question- 
lites, and so were never included in a sample. While a 100 per cent sample 
| have been ideal, it becomes much too expensive to even try. As noted 
fable 1, the coverage of a particular union is in each instance better than 
er cent, but never more than 75 per cent.’ It is likely that those workers 
esponded to our mail appeals represent a different type of person from 
o did not respond. How serious.a bias this is cannot easily be assayed. 


may be that these biases which affect all of the groups do not influence 
he difference between groups, since the preparation of occupational keys 
quires the determination of the interest patterns which diferentiate one 
Wpation from another. Thus, geographic location may produce a oa 
nber of workers who say that they like to fish, but would not affect the size 
differences between groups in this expression of interest. 

e present report concerns itself with an analysis of the responses to the 
esota Vocational Interest Inventory of workers in the eight civilian occupa- 
mal groups for whom samples of some size are available, These groups, and 
bizes, are reported in the last column of Table 1. 


Development of a Tradesmen-In-General Group 


Th order to score the responses of men in a given occupational group 
how the items which are answered in the same way by these men, it is 
sary to have some basis of comparison with persons not in that 
ational group. Thus, it is not enough to know that 75 per cent of 
ticians respond to an item in the same way, for it may be that 75 
t of all men would respond in that way. To obtain a group of 
ts who would represent a cross-section of all adult men in the skilled 
j has not been possible within the scope of this project. To obtain 
imate of the responses which such a group would make, the following 


formation given in Strong’s 
e what percentage returns he at 
d a 75 per cent return, or even & 50 per cei 


tained. However, it is probable that he seldom 
nt return. 


296 Kenneth E. Clark 


a, The percentage response of members of each of the eight occupational 
groups in the civilian trades being analysed (listed in Table 1) to every item of 
the inventory was computed. 

b. The percentage responses for each of the eight groups were added 
together and divided by eight, giving the average percentage response to each 
item. 

c. This average percentage response was used as the best estimate of the 
percentage response which would be made by members of an actual tradesman- 
in-general group. 

It should be noted that this procedure gives the identical values which 
would have been obtained if a representative and equal sample of the respon- 
dents in each of the eight civilian occupations had been used to make a single 
total group. The procedure operates to eliminate the over-weighting of occu- 
pational groups with a large number of members, and the under-weighting of 
occupational groups of small size. 

This estimate is obviously not an entirely satisfactory solution to the prob- 
lem of getting a tradesmen-in-general group which is truly representative. 
However, it seems likely that for the purposes for which the interest inventory 
is being developed, the procedure gives an adequate base for preliminary 
comparisons between groups. 


Table 1 


Numbers of Questionnaires Mailed and Returned for the Eight A. F. of L. Unions 
Sampled and Numbers Used in Developing Scoring Keys 


Number 
F Number Number PerCent Number Per Cent Used 
A.F, of L. Union Sent Returned Returned Usable Usable for Keys* 


Electricians 320 201 63 166 83 185 
ya Wagon Drivers 608 326 54 218 67 127 
ters 712 390 55 267 68 252 
Plasterers 167 111 66 74 67 51 
Bakers 473 305 64 144 47 64 
Sheet Metal Workers 298 220 74 164 75 99 
Printers 530 331 62 278 84 300 
Plumbers 576 347 60 199 57 65 
Total 1143 


i *N in this column is the number of persons in the different unions who identified 

ie ts belonging to a particular occupational group. Thus a few electricians 
wae somad in unions other than the electricians’s union itself; within the milk wagon 
drivers’ union were workers who were not actually milk wagon drivers. 


The Preparation of Scoring Keys 


n The purpose of a scoring key for a particular occupation is to make possible 
the comparison of responses of an individual to the responses of members of- 
a given occupational group, to determine whether or not the individual’s 
responses are like or unlike those of a given group. It is necessary, therefore, 


Vocational Interest Test at the Skilled Trade Level 297 


are the responses of the group to the responses made by the tradesmen- 
al group. Consider the example below: 


Item 18. Percentage Responses of “Like” Made by: 


Tradesmen-in-general Electricians 
61% 90% 
20% 5% 
19% 5% 


is apparent that electricians have, as a group, a more general preference 

g an electrical engineer than do tradesmen-in-general, although even 
tter group selects this response more frequently tan either of the other 
onses. We may score a response of “like” to the response a. as a 
counting towards a high score on the electricians’ key, since a sig- 
tly larger proportion of electricians pick this response than do the 
men-in-general. A response of “like” to either of the other two items, 
er, may be scored as a response detracting from a high electricians’ 
‘since a smaller proportion of electricians pick this response than do 
men-in-general. Thus, we might make up our electrician’s key as fol- 


Be an electrical engineer A response of + counts 1 point 
Be an aeronautical engineer A response of + counts —1 point 
¢, Be a surgeon A response of + counts —1 point 


wever, the respondent has also selected one of these three items as the 
hich he likes least, or dislikes most, and has marked that item with a 
k. Therefore these percentage responses must also be scrutinized. 


Percentage Responses of “Dislike” Made by 
aa ies fir iier bonnet 


Item 18. ‘Tradesmen-in-general Electricians 
n electrical engineer 12% 1% 
n aeronautical engineer 21% 21% 


surgeon 67% 


in similar fashion, these responses may be scored as follows: 

A response of — counts —1 point 
‘A response of — counts 0 points 
‘A response of — counts 1 point 


a. Be an electrical engineer 
e an aeronautical engineer 
e a surgeon 


roup than by the tra 
key for that particular group. A res 
of a particular group than by t 


ed as a minus in the key for that group. | 
S i d? It is apparent that such a response 
Ww shall the blank item be score wt oo aE ere 


eet 2 tho indent” RET however, ignored the blank item 
, 


> has meaning. The present anh TAE responses of like and dislike 


re use the blank Tapon 


Ww great a difference 


s ; $ cent 
with finality. In this report, r Bi eae in a given key. This 


n l uired for using an 
resents Meer TOS between the 6 per cent value used by Strong 


ifference was used as 


298 Kenneth E. Clark 


and the 20 per cent and higher values used with success by Hathaway and 
McKinley* in the development of the Minnesota Multiphasic Personality 
Inventory. 

Should greater percentage differences contribute more to total score than 
smaller ones? The Strong Vocational Interest Blank increases the differentia- 
tion between groups by assigning greater weights to responses which differ 
markedly in the criterion group from the responses made by men-in-general, 
and assigning smaller weights to responses where the difference is smaller. 
Preliminary data, in this study, however, indicate a possibility that use of 
multiple weights is not required to maximize the differentiation of occupational 
groups. 


Comparisons on the Eight Occupational Keys 


All members of the eight occupational groups were scored on the key 
for their own occupation. The distributions of these scores are listed 
in Table 2. Also presented in Table 2 are distributions of scores on each 
of these keys of persons not employed in the occupation. A comparison 
of each pair of distributions gives an indication of the degree to which the 
occupational scoring keys actually work in separating workers in a given 
field from persons in other skilled trades jobs. 

The distributions of scores of workers outside the occupation were 
obtained by scoring a sample of 25 inventories of workers from each of 
the other seven occupations. The selection of inventories was made on 
a random basis. A comparison of the distribution of scores of these 
samples of 25 with the distributions of the total group on its own occupa- 
tional key showed only slight differences. The scores of each of the 
combinations of seven groups (N = 175) not belonging to a given occupa- 
tion are distributed in Table 2 as the scores of tradesmen-in-general. 

Marked differences exist between the distributions of scores of non- 
members of an occupation and those of employed workers in the occupa- 
tion. Since the primary purpose of this investigation is to determine 
whether or not it is possible to separate members of skilled trades groups 
on the basis of their measured interests, measure of overlap between 
distributions of members and non-members is the appropriate statistic. 
Table 2 presents the per cent of the tradesmen-in-general exceeding the 
median of a given skilled group. This value varies for the different keys 
from 2.0% to 14.3%, with a median value of 6.3%. Thus, about six 
out of 100 workers not in a give occupation make scores above the median 
of employed workers on the typical scoring key prepared in this study. 

The percentage 6.3 does not compare too favorably with the values 
of two or three per cent obtained by Strong in his work with his Vocational 
Interest Blank at the professional levels. Does this difference indicate 
that trades groups are more nearly alike than are professional groups, 


*8. R. Hathaway and J, ©. McKinley.’ Minnesota Multiphasic Personality Inven- 
tory, Manual. New York: The Psychological Comont. i A 4 


Vocational Interest Test at the Skilled Trade Level 299 


Table 2 


tributions of Scores of Workers in a Trade (Group A) and Tradesmen-in-General 
i (Group B) on Each of Eight Occupational Scoring Keys 


Milk Sheet 
Plas- Wagon Elec- Metal 
terers Drivers Printers tricians Painters Bakers Workers Plumbers 


DA BO A B A B ARDIAREN 


4 
26 1 4 
46 4 18 
38 14 at as i EW bie) 
26 20 26 12 13 17 
16 18 19 21 7 27 
1 12 16 20 25 3 28 
1 5 7. 20 1 10 30 6 21 
8 21 5 24 if 4 27 18 
16 6 21 2 414 1 13 19 2 21 
1629 3 136 3 18 34 Gir 118) Me 10 
7 52 15 1 51 3 12 197 0 eo ied 10 10 
1 49 29 9 39 8 473 6 5 3 7 5 
2 30 31 14 41 15 Bee y ANS COEN Ta i Si Lae 6 
9 22 19 44 27 1 211% U 14 4 3 
16 30 25 22 1 6 12 
9 8 9 41 1 9 29 
2 40 7 36 8 28 
23 16 30 
3 2 21 
27 
4 


51 175 127 175 300 175 185 175 252 175 64 175 99 175 65 175 


30 10 —5 —34 6 —32 96 82 13 —2 —12 —48 74 51 91 57 
131 144 168 226 66 212 168 211 
t i 
pa) 3-4 10.3 6.3 6.3 2.0 TA 43 46 


* Per cent of distribution of scores of tradesmen-in-general exceeding median of the 
tribution of scores of members of a trades group. 


anation? The writer believes that the use of a 


is there another expl € 
deciding to use or not use a given 


300 Kenneth E. Clark 


that multiple weights were not used in the present keys, but are used by 
Strong, operates to increase percentage overlap. 

Inspection of Table 2 reveals that the eight keys differ not only 
in their power to separate members of a trade from outsiders, but also 
jn the kinds of distributions which they produce. One is immediately 
struck by the differences in the median raw scores attained by workers 
on their own occupational keys. Electricians have a median score of 
plus 96, while bakers have a median score of minus 12. It is easy enough 
to see how high plus scores are attained, but why should a group get 
a minus score on its own key? The answer is to be found by studying 
the percentage responses of bakers to the items which differentiate them 
from tradesmen-in-general. These item responses are generally un- 
popular items. Tradesmen-in-general select them only 10 to 20 per cent 
of the time; even bakers select each item, on the average, less than 
50 per cent of the time. This is a rather interesting finding, since it 
indicates that interest patterns may operate to differentiate occupational 
groups not only by use of items selected by an overwhelming majority 
of a group, but also by use of items actually rejected by a majority of a 


The differences in variability of the distributions of scores reflects 
to a considerable extent the number of responses included in the scoring 
key. Since there are 570 items in the inventory, and since either re- 
sponse of like or dislike is scored, a total of 1140 responses are scorable. 
Whereas a large number of items differentiated electricians from non- 
electricians (226), only a small number of items did the same job for 
painters and non-painters (66). This difference is undoubtedly due, in 
part, to the kinds of items included in the inventory. It may also be 
due, in part, to real differences in the degree to which workers within an 
occupational group resemble each other. It is possible that painters, as 
an occupational group, have fewer basic interests in common than do 
electricians. 

A small dispersion of scores is usually associated with lower reliability, 
which suggests that the keys with less variability are also those with high 
percentages of overlap between criterion and control groups. A com- 
parison of the percentage overlap with the number of items in the key, as 
given in Table 2, gives no support to this notion. The present data 
indicate that a small dispersion of scores is not, in itself, undesirable. 


Relationships Between Scoring Keys 


Table 3 presents the correlations between scores on each of the eight 
keys obtained for the sample of 200 tradesmen whose inventories were 
scored on all keys. These correlations range from high positive to high 


Vocational Interest Test at the Skilled Trade Level — 301 


ive values, giving the impression that the scores on various keys 
d to cluster in rather meaningful patterns. Thus, the interests of 
k wagon drivers and bakers seem to have much in common, as do the 
sts of electricians, sheet metal workers, and plumbers. The in- 
sts of painters, on the other hand, have little relation to those of any 
f the other seven groups. 
_ The small number of occupations involved makes it difficult to de- 
ermine to what extent these clusterings result from the sampling of 
ccupational groups in the study, and to what extent they result from real 
lilarities of interest of the different workers. It is easy to see how 
ilk wagon drivers and bakers would appear very much alike when 
mpared with workers in the building trades; whether or not this same 
‘ee of relationship would hold if the sample of occupations were larger 


Table 3 


Intercorrelations of Scores on Eight Occupational Keys 
for a Sample of 200 Tradesmen* 


Milk 
Plas- Wagon Elec- Metal 
terers Drivers Printers tricians Painters Bakers Workers Plumbers 


e -11 -60 12. 30A S 
Milk Wagon Drivers 45 —78- 03 Am -78 
inters —68 —.06 68 —81 —.74 
—.19) RI SS a SD 

04 —12 —.18 

-85 —.86 


Metal Workers 
11.2 —28.6 —24.1 565 —0.5 —419 49.3 56.4 


ndard Deviation 13.35 19.00 22.85 32.89 7.80 26.17 25.56 28.14 


25 men from each of the eight occupations were scored on all eight keys. Thus, 
key is scored for 25 members and 175 non-members of the given occupation. 


It is fairly certain that the actual 


id more heterogeneous is not clear. 
) s hieved in this study would have 


ations of workers from outsiders ac this, l 
n considerably more spectacular if a wider diversity of occupations 
been included. The degree to which electricians, plumbers and 
metal workers cluster, as do bakers and milk wagon drivers, tends 
‘obscure the marked differences between the two clusters and the 

3 Be . . . 
T nother Malad of portraying the clustering of occupations is used 5 
le 4, in which the median percentile score of each of the eight ep o 
given for two keys—the electrician’s key and the milk wagon driver’ : 
Percentile scores are computed on the distributions of scores oi 


302 Kenneth E. Clark 


Table 4 
Median Percentile Scores* on Each Occupational Key Attained by 25 Electricians 
and 25 Milk Wagon Drivers 

25 Milk Wagon 25 Elec- 

Key Drivers tricians 
Plasterers 8 5 
Milk Wagon Drivers 48 1 
Printers 5 3 
Electricians 3 51 
Painters 7 5 
Bakers 26 1 
Sheet Metal Workers ri 25 
Plumbers 3 22 


* Percentile scores are computed using the distribution of scores of each group of 
employed workers on its own occupational key; that is, the distributions of scores for 
group A listed in Table 3. 


workers in the occupation. The close relationship between the milk 
wagon driver’s and the baker’s key is indicated when the median score 
of milk wagon drivers on the latter key is at the 26th percentile of the 
baker's distribution of scores. The same sort of clustering occurs be- 
tween electricians, sheet metal workers, and plumbers. 


Summary 
The present report has analyzed the interests of members of eight 
A. F. of L. trade unions. When scoring keys are prepared to differentiate 


between members of a trade group and a composite group of tradesmen- 
in-general, it is found that: 


1, Workers in a trade can be separated from workers in other trades 
on the basis of their measured interests with considerable success. About 
six workers out of a hundred will exceed the median score of tradesmen 
in an occupation other than their own. 

2. The separation is achieved with a rather crude criterion for pre- 
paring scoring keys: that the response of the one group differ by eleven 
percentage points or more from the response of the composite tradesmen- 
in-general group. 

3. Distributions of scores on the different scoring keys vary: consider- 
ably both in central tendency and in variability, but these values are 
not closely related to the goodness of the keys, as defined by the degree 
of separation between workers in and not in the trade. 

4, Correlations between scores on the eight keys indicate a clustering 
of trades with respect to measured interests. Workers in three unions 


Vocational Interest Test at the Skilled Trade Level 303 


to the building trades (electricians, plumbers, and sheet metal 
i) tend to have related interests, but to differ markedly both from 
in two service occupations (milk wagon drivers and bakers), 
om workers in two other building trades (painters and plasterers). 
The data analyzed thus far seem to suggest that skilled trades 
ys may be ordered into families of occupations with rather similar 
s, so that it may not be necessary or desirable to differentiate 
closely related occupations either in preparing separate scoring 
or in the guidance of young persons contemplating entry into these 

of work. However, this aspect of this program of research requires 
siderably more work than has been completed thus far. 


d December 16, 1948. 


A Selection Battery for Bake Shop Managers * 


Edwin B. Knauft 
Federal Bake Shops, Inc., Davenport, Iowa 


A number of investigators (1, 3, 6, 11, 16, 18) have attempted to 
develop and validate series of items or test batteries which would effi- 
ciently predict executive or supervisory job success. Some of these 
studies were moderately successful, but the majority reported “validity” 
data which were based only on the original population used in the stand- 
ardization or item analysis of the tests. It is generally recognized that 
the abilities or characteristics contributing to success variance in man- 
agerial positions are difficult to isolate and measure. The problem is 
further complicated by the fact that it is often impossible to obtain a 
relevant and reliable criterion of supervisory job success. In addition, 
validation is difficult because it is unusual for a large number of super- 
visors or managers to be engaged in the same or similar job duties. 

The objective of the present study is to construct and attempt to 
validate a series of written tests which will predict subsequent on-the-job 
behavior of shop managers in a retail—manufacturing bakery chain. 
Seventy-nine managers of bake shops were available for the initial re- 
search, 85 managerial applicants were used in the subsequent develop- 
ment of test norms and 33 new managers were followed up on the job 
and formed the cross validation study. 


The Bake Shop Manager 
The managers were employed by a chain which operates 88 retail- 
manufacturing bake shops in the Midwest, East and South. Each shop 
is under the supervision of a manager who directs both the manufacture 


of ean products from raw materials and the sale of the products to the 
public. 


The principal duties of the manager may b i : - 
A A y be summarized as follows: (1) pur- 
pate raw maera (2) directs and sometimes participates in the manufacture 
9 ada ed products from these raw materials; (3) determines the variety of 
in ucts and quantity of these products which shall be produced each day; 
@ pio ee the cost and determines the selling price of each product; 
ires, discharges and supervises the work of bakers, baker apprentices, sales- 
* This paper is based on a thesis submitted in partial fulfillment for the Degree of 
Doctor of Philosophy at the State University of Iowa. The writer acknowledges his 
oe to Professors Dewey B. Stuit and Harold B. Bechtoldt for their helpful 


304 


Selection Battery for Bake Shop Managers 305 


and porters; (6) keeps financial records, pays employees and writes check 
raw materials purchased; (7) sends financial, sales and inventory nore 
home office; (8) works under the general supervision of a district manager. 


4 The Criterion Problem 


' Since each manager has primary responsibility for the successful 
operation of his unit, it is reasonable to presume that the general financial 
dition of the unit—in terms of profit and loss—would reflect the 
ilities of the manager. The actual profits of each unit, however, are 
a satisfactory measure of managerial ability because certain expenses 
under the manager’s control affect the profits. The standard com- 

accounting procedures divide the expenses of each unit or shop into 
atrollable and uncontrollable costs. Variables largely under the con- 
jl of the manager are grouped together and are known as total con- 
liable costs. The actual dollar volume of these costs is partly a 
action of the total sales of a unit, and hence a direct comparison of those 
es from unit to unit would place the manager of a small unit at con- 
able disadvantage. For this reason, the ratio of this cost to the 
sales of the unit is computed and affords a unit to unit comparison. 
s measure, designated hereafter as total controllable cost, is one possible 
terion measure of managerial ability. 


Data on the total controllable costs were obtained for all units from the 
Company operating statements. The data were first analyzed by dis- 
tricts and it was found that there were rather large differences between the 
ns of certain districts. An analysis of variance was made of these district 

ita to test the hypothesis that the district means varied from each other only 
chance. The resulting F value of 4.92 is significant at better than the 1% 
jel of confidence, indicating that these differences may be due to factors 
ther than chance. It therefore seemed reasonable to use the controllable 
ratios as a measure of individual manager performance only after these 
had been corrected for the “district effect.” A district correction factor 
6 cumulative cost percentages of each unit. The base 
tor was the difference between the 1946 controllable 
e corresponding value for the given district. 
h of time in a managerial position and the 
ermine the effect of experience. 

rs from three to six mont 
ntage which was signifi- 

1 manan T voa 
therefore seemed reasonable to 
w the job for three 


imate he values of odd vs. even months, was .96. 

l aka ey ‘the management of the Company felt ree re ad 

aterials cost should be given more weight in a composite criterion, ro m 

flected by the actual contribution of raw mate erat 8 it ES ii 

was available a raw materials percen age Hagar 
les of the unit) which reflects the manager's ability 

Be sickens wisely, to prevent waste during production and, indirectly, 


306 Edwin B. Knauft 


to correctly set the selling prices of his products. The raw materials percentage 
for each unit for 1946 was used as a second criterion measure. These data 
were analyzed by districts in the same manner as the total controllable costs 
and again the analysis of variance yielded an F value (3.14) which was signifi- 
cant at better than the 1% level of confidence. A correction factor computed 
in a manner similar to that described above for controllable costs, was calcu- 
lated on the basis of the difference between the raw material percentage for 
each district and the raw material percentage for the entire company. The 
correction factor for each district was then applied to the raw material per- 
centage of each unit in the given district. The reliability of the raw material 
measure was estimated by correlating the corrected unit data of odd months 
of 1946 with even months. The resulting corrected coefficient was .92. These 
reliability data were based on 63 units in which there was no change in managers 
during the year. 

A Subjective Rating of Performance. The manager’s job is so complex that 
many aspects of mangership probably are not directly reflected in the two 
criterion measures which have been discussed. Some type of merit rating 
procedure appeared to be the only technique which would measure those 
managerial qualities not reflected in financial data of the units. A survey 
of various ska of personnel merit rating methods (9) led to the conclusion 
that the weighted check-list type of rating scale would be appropriate in the 
present situation. This technique, involving the equal appearing intervals 
method of Thurstone (17), was first applied in a merit rating situation by 
Richardson and Kuder (14). 

The procedure used to construct a weighted check-list rating scale for bake 
shop managers has previously been reported in detail (10). This scale, which 
was comprised of two forms of 24 items each, was used to evaluate the 79 
managers in the initial study. The basic rating data were obtained from the 
evaluations resulting when each district manager applied both forms of the 
scale to his unit neers. The product moment correlation of scores obtained 
on the two forms of the scale was .79. The reliability of the combined scale 
consisting of both forms was estimated by the Spearman-Brown formula to 
be .88, Additional data were available on 35 of the managers who were also 
tated by their respective assistant district managers. The reliability coeffi- 
omg the scale, based on the ratings of the 35 managers by two superiors 

The rating used as a criterion measure for each manager was the mean 
of the scores received on the two forms of the scale. The rating scores assigned 
by the district managers were subjected to an analysis of variance to determine 
if there were significant differences between the mean ratings made by the 
various district managers. The resulting F value of 2.15 was not significant 
at the 5% level of confidence, indicating that the differences in mean ratings 
mey be attributed to chance alone. 3 

h he effect of job experience on the rating score was checked by comparing 
the mean rating scores obtained on nine managers with three to six months 
pid ay with Scores of men who had managed for more than one year. In 
y e absence of a significant difference between these groups, using the t test 
or small samples, it appears that within the time limits studied, variation in 
experience is not associated with the average rating score. 


Combination of the Criterion Measures. It is first necessary to examine 
the comparability of the three criterion measures in terms of their 
respective means and standard deviations. These data, together with 
the reliability estimates for the three measures, are summarized in Table 
1, These figures are based on data corrected for the district effects. 


Selection Battery for Bake Shop Managers 307 


le indicates that the three measures are not directly comparable 
eir present form because they do not have equal variances nor equal 
units. _ 


Table 1 
Summary Data of the Criterion Variates 
Anis Reliability 
Criterion Estimate Mean §.D. 
Total controllable costs 96 68.1 3.3 
Raw material costs 86 37.5 19 
Rating score 88 5.5 0.7 


The intercorrelations between the measures presented in Table 2 
te the extent to which the criterion variates overlap. In inter- 
this table it should be noted that controllable costs and raw 
rial costs are experimentally dependent, and even the rating score 
lay not be entirely independent of the above two because the rater’s 
ledge of a manager’s operating data might influence some of the 
fer’s responses to the rating form. The raters, however, did not see any 
ating data after district corrections had been made. The inter- 
ations of Table 2 suggest that these variables may have a con- 
ous common element which we may call managerial success or 


Table 2 
Intercorrelations of Criterion Variates 


Comparison 


Controllable costs vs. raw material costs 

Controllable costs vs. rating score $ 

Raw material costs vs. rating score Al 
i i ximation of the true 

This coefficient of .65 may be regarded only as an approximation o! 

ati itions: (1 raw material costs are actually a portion of 

tion because of two conditions: ( ) Seea acest ia aes 


ce the two measures i 
ee ratios which have the same denominator, 


. An application of this correction here yields a 


ly low because it is based upon the assumption that } 
gible and that the two ratios have & zero correlation. This yang ne onie 
by a partial correlation technique which is inappropriate Cause 


all number of cases. 


308 Edwin B. Knauft 


into normalized standard scores, using the percentile method, and the 
three standard scores were averaged for each individual manager. This 
procedure gave equal nominal weights to each of the three variables, but 
the effect of the intercorrelations and the variance relationship between 
total controllable costs and raw material costs actually gives a greater 
effective weight to raw material costs. Because of the greater impor- 
tance attached to this latter variable by top management, the weighting 
used here is in the desired direction and appears to yield a satisfactory 
combined criterion measure. 


Preliminary Test Battery 


The preliminary test battery was assembled and administered to all 
bake shop managers. Those managers who had three or more months 
experience formed the criterion group which was used in the evaluation 
and item analyses of the several tests. Following is a brief description of 
these tests and preliminary results obtained from them. 


_ General Mental Ability. The research of previous investigators (1, 6, 12) 
indicates that there is some positive relationship between scores on short 
mental ability tests and job success in certain managerial or supervisory 
ositions. The Wonderlic Personnel Test, a 12 minute revision of the Otis 
‘A Test, scored by taking the number of correct responses made in the 12 
minute time interval and correcting the score for age by the use of Wonderlic’s 
table (19), was included in the present battery. The scores of the population 
of 79 managers ranged from 7 to 34 with a mean of 19.8 and an S. D. of 6.5. 
A significant Telationship between tests score and educational level is revealed 
by a correlation of .39. The correlation of test score with the composite cri-, 
erion was .13, while a correlation of .20 was obtained between test score and 
size of unit the manager operated. The latter value just fails to be significant, 
for an r of .22 is required for the 5% confidence level for this size of sample. 
“Solita Interests and Attitudes. Previous investigators agree that the 
personality of the manager or supervisor is one of the most important single 
elements contributing to job success. Tests in this area have thus far been 
rather unsuccessful as selection instruments, partially because such paper 
and pencil inventories can often be “beaten” by the applicant. In the selection 
prunon the applicant’s motives may lead to responses which he thinks will 
eo p him obtain the Position. A second shortcoming of the usual “personality 
est” is that it is scored in terms of a number of traits such as dominance, intro- 
version, frankness, ete. Since it cannot be precisely determined if these traits 


are required for success i ; > À Rares hes 
‘test is a difficult and nai fey job, the industrial validation of suc! 


ible procedure. 
Jurgensen (7) has recognized the i types of 
personality inventories cr gniz e shortcomings of the common typ 


; > when used as selection instruments, and has constructe 
Pisin hie he believes possesses distinct advantages in the industrial 

hich ‘he me Hes utilized the forced choice technique of item arrangement 
which a also been used in the selection of army officers (15). The three 
ae advantages of Jurgensen’s Classification Inventory are that: 1. the 
Hee. icant is generally not ab e to predict the “right” answers when attempting 
s pty a job; 2. the test is scored and validated on a specific job and & 
ates me developed for the job in a given company; and 3. no hypothetical 
with the eration: necessary because each item can be correlated separately 


Selection Battery for Bake Shop Managers 309 


‘tentative scoring key for bake shop managers was developed in accord- 
‘with the procedure recommended by Jurgensen (8). The criterion 
ation of 79 managers was split into two groups which comprised the 
st 27% and lowest 27% in terms of the composite criterion. e scoring 
were based on items which differentiated between the high and low 
ion groups at the 10% level of confidence or better. The resulting 
s of the managers correlated .64 with the composite criterion, but this 
ue must be interpreted with extreme caution because 54% of this popula- 
n was used in the construction of scoring keys for this test. 


h validit; 
nalysis of the 


b 


r . : te d 
ment in Managerial Problems. A number of items were construc 
ae resent some Of the decisions and judgments & bake mop manapi 
make. The items were arranged in multiple choice form, but t! Pinta - 
®t is required to select the first, second and third best choices from the 


natives in each item: i AE ES a 5 
T Assume your best selling item is a pecan ring that sells tor each. 
item Bstsante for 10% of your sales. Pecans are selling for 40¢ per pring 


Then the price of pecans jumps to $1.25 per pound because of crop failure. 


i l other item; 
A. St king pecan rings and try to build up sales on anothe y 
B. Teens the ate of pecan rings to 60¢ because the material cost 


ill be in line at this price; i HE 
C. TIUA the price the same and try to make up your loss by increasing 


i i other items; z 
D. pe a oird as many pecans as before and leave the price the 


same. nt 
Items of this type have been grouped together as the F 
st. A rank order of several Sees: 
ee en che at ‘fpest” okie: The item analysis and scoring 
Swere based on the responses of the top and bi 


310 Edwin B. Knauft 


The scoring key was constructed by computing the percentage of “high” 
and “low” criterion managers respon to each item alternative as a first, 
second or third choice. This analysis yielded 12 items containing one or more 
responses which successfully differentiated between the high and low criterion 
groups. Scores on this test correlated .52 with the composite criterion. 

A second test was constructed which was designed to measure managerial 
judgment in a specific context. One of the important daily duties of the 
manager is that of ordering the quantity and selection of baked goods to be 
produced foreach day. This requires an accurate estimate of expected volume 
of business on the following day so that the shop will not be sold out before 
closing time and, conversely, that few or no items need be carried over as 
“stales.” The Bake Order Problem was constructed on the assumption that 
a hypothetical store under given weather conditions could serve as a basis for 
measuring an individual’s ability to use correct judgment in making out the 
order. After this problem was administered to the criterion population, 
responses to different portions of the problem were analyzed to determine if 
there were significant differences between the high and low 27% criterion 
groups in terms of quantity of each item ordered and variety of items ordered. 

total of 13 analyses were made on different portions of the problem, but all 
results were negative and this problem was omitted from the revised battery. 

Biographical Data. As Erig as 1922 Goldsmith (5) found that biographical 
information items were helpful in the selection of insurance salesmen. Uhr- 
brock and Richardson (18) and the Army (15) have both used such items in 
personnel selection batteries. Personal data were collected in the present 
study by means of a Biographical Information Blank which included 33 items. 
Since these items were first being used on present managers, it was necessary 
for the testee to answer each item as it applied when he first became a manager. 
For example: 

How many of the following did you own or were you buying just before 
you became a Federal manager? (Mark as many answers as apply): A. Stocks 
or bonds ($100 to $300); B. Stocks or bonds (more than $300); C. A house; 
D. Home furnishings; È. A bakery; F. A car. F 

An item analysis was performed on the 33 biographical itemis using the high 
27% and low 27% of the criterion groups responding to each alternative. 
The resulting response frequencies for most item alternatives were extremely 
small and consequently the validity estimates were unstable. It was also 
found that only a small number of items discriminated between the two 
criterion groups. For these reasons the Biographical Information Blank was 
omitted from further consideration. 

Name and Number Checking. The typical bake shop manager spends about 
one hour per sey on report work and simple bookkeeping. In the light of 

hese activities the Minnesota Clerical Test was anded in the preliminary 
battery. The results obtained indicated a lack of correlation between responses 
on this test and the composite criterion. The correlation was .19 between 

numi Score and the criterion and —.15 between “names” score and the 
criterion. The 79 managers obtained a mean score of 108.7 and an S. D. of 
30.3 on the Numbers Test and a mean of 92.5 and S. D. of 28.5 on the Names 
Test. These mean scores are rather low when compared to norms for male 
clerical workers reported by Andrew and Paterson (2). It may be hypothesized 
that the managers in the present study do not perform enough clerical work 
or can do this work at their own pace in a manner which is not identical with 
aa porentune speed factor which probably operates in the Minnesota Clerical 


The Revised Battery 
The following tests were included in the revised battery: Classification 
Inventory, Baking Knowledge Test, Federal Management Test and 


Selection Battery for Bake Shop Managers 311 


ic Personnel Test. This battery was administered to a new popu- 
n of 85 manager applicants already in the employ of the company as 
rs. This population was used to establish tentative norms for the 
“Thirty-three of these applicants were selected for manager train- 
d subsequently became managers. These men formed the cross 
ion population. In addition, 23 present managers who had taken 
eliminary battery were retested on the revised battery to furnish 
ity data on the various tests. 
e population of 85 applicants had a mean age of 32.8 years, a mean 
ion of 10.0 grades and a mean of 11.2 years civilian baking ex- 
Corresponding data for the original group of managers indi- 
mean age of 42.7 and mean education of 10.0 grades. 
ability of the Battery. Reliability data are based on 23 managers 
ere retested seven months after the original testing. The reli- 
ty estimates for the battery are presented in Table 3. 
Phe low reliability of the Federal Management Test casts doubt on 


Table 3 
‘est-Retest Reliability Estimates for Tests in the Revised Battery (N = 28) 


Test Reliability (r) 
Classification Inventory 78 
Baking Knowledge 7 
Federal Management a 


Wonderlic Personnel Test (Forms A and B) 


‘The mean scores obtained by these 23 men on test and retest sessions 
ere analyzed to determine if any significant shifts had occurred. It 
found that there was no significant change in group mean scores for 
Classification Inventory and Federal Management Test. However, 


n scores on both the Baking Knowledge Test and Personnel Test 


eased significantly. These differences were significant at better than 
1% confidence level. It is possible that fi ty with the testing 


ation, positive practice effect and memory may have accounted for 
increase in scores on the latter two tests. i i 
T Tntercorrelations of Test Scores. One of the interesting ated A 
ed from the applicant group is rev roo een p a hy 
i t intercorrelations irom ) 

several tests. The test m i This ani 


from the applicant population are presented 
ie oe rrelations on all tests except the Personnel 


ws that the original interco: e p 
were much Tigh than the values obtained from the applicant 


312 Edwin B. Knauft 


population. This finding might be anticipated because the Personnel 
Test was the only one of the four tests which was not item analyzed or 
scored on the basis of the responses of the original population of managers. 
None of the intercorrelations involving the Personnel Test shifted 
markedly on the new population, whereas the other values show a definite 
decrease for the applicant group. Such results accent the fact that 
correlational values obtained from a population which is used in the con- 
struction or item analysis of tests will generally be spuriously high. 


Table 4 
Intercorrelation of Tests 


Note: Superior values in each cell are based on 85 applicants. 
Values in parentheses are based on criterion population. 


Baking Federal Personnel 
Knowledge Management Test 
Classification Inventory —.16 —.02 .38 
(.44) (.48) (.28) 
Baking Knowledge 04 17 
(41) (.19) 
Federal Management —.10 
(.00) 


The only significantly positive intercorrelation for the applicant 
population is between the Personnel Test and the Classification In- 
ventory. Thus these two measures are somewhat dependent, although 
the size of the correlation does not indicate a marked “overlap.” Only 
the Baking Knowledge Test correlated significantly (r = .45) with num- 
ber of years of baking experience. None of the tests correlated signifi- 
cantly with age and only the Personnel Test correlated significantly with 
number of years of education (r = .43). 

Validity Data. Thirty three of the 85 applicants were appointed as 
unit managers. These men were given a Company manager training 
program before being assigned to a managerial position. ‘The 33 men had 
actually been managing for an average of 8.8 months when the follow-up 
study was conducted and criteria of their job performance were collected. 
Criterion data were obtained on these men by the same procedures as 
were used in the original study and the composite criterion used in the 
ae study is identical in composition with that used in the original 
study. 

It should be pointed out that the district manager (who was the rater) 
knew the new manager’s test scores in ten out of the 33 cases. This 


Selection Battery for Bake Shop Managers 313 


e validity coefficients of the various tests in the revised battery 
sented in Table 5. Correlation coefficients are not very appropri- 
es for such a small sample. In the present sample the Classifi- 
Inventory has a coefficient which is significant at better than the 
4 level. None of the other coefficients in Table 5 are significant. 

‘An alternate method of estimating the validity of the several tests is 
ompare the test scores of men who had high criterion scores with the 
of men who received low criterion scores. For this analysis the 
ation of 33 managers was divided into the best 16 and poorest 16 
he basis of the composite criterion measures. The remaining case 
omitted from this analysis. The “high” group of managers made 
nificantly higher scores than the “low” group on two of the tests—the 
ssification Inventory and the Personnel Test. In both cases, the t 
ies for the differences between mean scores were significant at better 
the 5% confidence level. The Baking Knowledge Test and the 
Management Test failed to differentiate between good and poor 


rers. 
Table 5 
Correlations Between Test Scores and Criterion (N = 33) 
Test Validity Coefficient 
Classification Inventory .39 
Baking Knowledge —.12 
Federal Management .06 
Wonderlic Personnel Test .26 
It was decided to eliminate the Federal Management Test from further 


as a selection device because of its lack of validity and its low pi 
ity coefficient (r = .46). Although the Baking Knowledge Test dii 


i i i this test was 
appear to be a valid predictor of managerial success, te 
ined in the battery because it would be useful in determining the 
unt of baking training the applicant would require before he was 


stalled as a manager. 


) set cutting scores for these tests. s fc 
f 16 on S Personn Test would have eliminated 44% of the “low” 


9% of the “good” managers. ð ; 
rcentile ak oE 25, as determined from the population of 85 applicants. 


314 Edwin B. Knauft 


Similarly, a passing score of 25 (percentile rank of 48) on the Classification 
Inventory would have eliminated 50% of the “poor” managers and 36% 
of the “good” managers. If these two tests were used together and both 
of the above cutting scores had been used in combination, 63% of the 
“poor” managers would have been rejected as opposed to 36% of the 
“good” managers. On the basis of these combined cutting scores, 45% 
of the 85 applicants would have been considered as acceptable for mana- 
gerial training. However, these cutting scores should be considered as 
tentative until they are validated on a second independent sample. 


Summary 

A study has been made of the prediction of managerial success in a 
retail-manufacturing bakery chain. An empirical approach has been 
used to select tests and to select and weight items on the basis of the 
responses of a criterion population of 79 managers. Three criterion 
measures were obtained on these managers and these were combined into 
a composite criterion score for each manager. 

A preliminary battery of seven tests was administered to the 79 
managers. A subsequent item analysis suggested that a revised battery 
beassembled. This contained the Baking Knowledge Test, the Wonderlic 
Personnel Test, the Classification Inventory and the Federal Manage- 
ment Test. This revised battery was administered to 85 applicants for 
managerial positions. New test norms, intercorrelational data and reli- 
ability data were obtained from this population. All tests in the revised 
battery except the Federal Management Test had acceptable reliability 
coefficients. 

The validity of the battery was estimated by comparing the test 
Scores and criterion scores of 33 of the applicants who subsequently be- 
came managers. Only one of the tests—the Classification Inventory— 
had a validity coefficient which was significantly different from a zero 
correlation at the 5% level of confidence. A comparison of the mean 
test Scores of the upper and lower halves of this group, based on the 
criterion, indicate that both the Wonderlic Personnel Test and the 
Classification Inventory significantly differentiated between the good and 
poor managers. The Federal Management Test and the Baking Knowl- 
edge Test lacked validity, but the latter test was of value in determining 
how much additional baking training was required by the individual. 
Received December 6, 1948. 


References 
1. Achard, F. H., and Clarke, F. H. You can measure the probability of success 85 
a supervisor. Personnel, 1945, 21, 353-373. 
2. Andrew, D. M., and Paterson, D. G. Manual for the Minnesota Clerical Test. 
New York, The Psychological Corporation, 1946. 


Selection Battery for Bake Shop Managers 315 


, R. O., and Levine, M. Selecting executives: an evaluation of three tests. 
’erson. J., 1930, 8, 415-420. 
rdall, A.J. Manual for the test of practical judgment. Chicago, Science Research 
“Associates, 1942. 
Goldsmith, D. B. The use of the personal history blank as a salesmanship test 
"J. appl. Psychol., 1922, 6, 149-155. 4 
ell, W. Testing cotton mill supervisors. J. appl. Psychol., 1940, 24, 31-35. 
jurgenson, C. E. Report on the “Classification Inventory,” a personality test for 
T industrial use. J. appl. Psychol., 1944, 28, 445-460, 
sen, C. E. Manual for Classification Inventory, privately printed, 1947. 
uft, E. B. A classification and evaluation of personnel merit rating methods. 
J. appl. Psychol., 1947, 31, 617-625. 

, E. B. Construction and use of weighted check-list rating scales for two 
industrial situations. J. appl. Psychol., 1948, 32, 63-70. 
Mandell, M. Testing for administrative and supervisory positions. Educ. & 
l Psychol. Meas., 1945, 5, 217-228. 
ell, M., and Adkins, D. C. The validity of written tests in selection of 
l administrative personnel. Ed. & Psychol. Meas., 1946, 6, 293-312. 
rs, C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 
bases. New York, McGraw-Hill, 1940. 
dson, M. W., and Kuder, G. F. Making a rating scale that measures. 
Person. J., 1933, 12, 36-40. 
Staff, Personnel Research Section, Adjutant General’s Office. The forced choice 

technique and rating scales. Amer. Psychologist, 1946, 1, 267 (Abstract). 

ompson, C. E. Selecting executives by psychological tests. 2d. & Psychol. 


Meas., 1947, 7, 773-778. 
d. Amer. J. Sociol., 1928, 33, 529-554. 


‘Thurstone, L. L. Attitudes can be measure’ J ‘ t 
brock, R. S., and Richardson, M. W. Item analysis: the basis for constructing 


is test of supervisory ability. Person. J., 1933, 12, 141-154, 
erlic, E. F. Wonderlic Personnel Test Manual. Privately printed, 1945. 


A Note on Mechanical Aptitude of West Texans 


Albert Barnett 
Texas Technological College 


Two tests claiming to measure mechanical aptitude have been rather 
widely used at Texas Technological College. For a number of years, 
freshmen, as part of an orientation program, were given the Revised 
Minnesota Paper Form Board, a paper-and-pencil test requiring the 
testee to combine in his imagination a few disarranged geometrical plane 
figures to form one large figure and select the correct answer from among 
four or five suggested solutions. The Minnesota Spatial Relations test 
requires that the testee fill a number of irregular holes in each of four 
form-boards with the appropriate cut-out blocks, no two of which are 
alike, the score being the number of seconds required to complete the 
task. This test has been used for some time on an individual basis at 
the Texas Tech. Guidance Center. 

It is evident that neither of these tests places much, if any, emphasis 
on mere hand skills, but on the mental factor of space relationship, which, 
it is claimed, accounts in part for mechanical aptitude. 

During the fall semester of 1941, the Revised Minnesota Paper Form 

AA, was run on 371 freshmen (mainly from West Texas) of the 

Arts and Science Division of Texas Technological College. Their mean 

ical age was between 18 and 19 years, the approximate range 

being 15-23 years, Their median score was 42.5 equivalent to the 

on the norms of liberal arts freshman men, whose median score 

was 38. The fact that these young freshman liberal arts boys, on the 

Average, excelled 70 per cent of the standardization group in mechanical 
was merely noted, but not explained. 

During several months in 1947-1948, a record was made of the scores 
of men on the Minnesota Spatial Relations Test coming to the Texas. 
Tech. Guidance Center for vocational advisement. These men, mainly 
in their twenties and thirties, came from several different West Texas 
counties, and represented every educational level from the illiterate to the 
college Graduate. Each man was tested individually by a trained psycho- 
metrist. Results are shown in Table 1. It may be noted that the mean 
time required by this sample of 383 men to complete the test was 973.5 
seconds, which compares to the mean (apparently) of 1279 seconds for 
the norms furnished by the publishers of the test. The difference be- 


316 


Mechanical Aptitude of West Texans 317 
l Table 1 

Minnesota Spatial Relations Test Results 

(883 Men, Texas Tech. Guidance Center) 


Required for 


apleting Test f Percentile Standard Score 
3 99.19 6.90 
16 96.72 6.45 
59 86.97 6.00 
87 67.99 5.55 
74 47.06 5. 
63 29,25 4.06 
29 17.29 4.21 
21 10.79 3.76 
11 6.63 3.31 
6 4.42 2.86 
10 2.34 241 
1,04 1,06 
1 91 1.52 
1 65 107 
2 26 0.62 
383 
eu = 114 e = 222.81 
1 Table 2 
Norms for Men Compared to Achievement at the Texas Tech, Guidance Center 
Secale E tis 
- ‘Time in Seconds for all Four Boards 
ter  Mid-Si 
| Rating Score Texas Group Published Norms 
7.0- Up to 639 Up to 036 
6.0 640-861 937-1131 
5.0 862-1085 1132-1427 
4.0 1086-1307 1428-1994 
3.0- 1308 and above 1935 and above 
een the Texas group and the norm group is revealed in Table 2, which 


the score range equivalent to the letter ratings on the test for the 


exas group compared to the published test norms.' 
_ As yet, no satisfactory explanation has been found for this superority 


$ tested) of West Texas men in mechanical aptitude. It is true that 
i region from which the Texas 
Minnesota Spatial Relations Test: Examiner's 
est Bureau), p. 3. 


318 Albert Barnett 


farming. Most of these men had been accustomed to tractors and other 
machines since boyhood. Some worked with machines in the oil fields of 
this region. The Spatial Relations Test, however, is supposed to be, as 
stated in the Examiner’s Manual, “relatively free from the influence of 
previous mechanical experience.” 

As stated previously, the men who were tested at the Texas Tech. 
Guidance Center were young men. It is not known whether or not they 
were as a group younger than the standardization group. As a check 
on the possible influence of age on test performance, the Texas group 
was separated into two discrete sub-groups, namely; those requiring 
one thousand seconds or more to complete all four boards of the test and 
those who completed the test in eight hundred seconds or less. The 
former group had a mean age of 26.7 years as compared to 24.2 for the 
latter, the standard error of the difference being .66 and the critical ratio 
3.76. While it is true that the poorer performance is associated with the 
older group, there is much over-lapping. Furthermore, it is possible that 
among the older men, those who had failed to adjust occupationally be- 
cause of poor mechanical aptitude, tended to present themselves for 
testing and guidance more than was the case of those who had adjusted. 
Further study needs to be made on the relationship of tested hand skills - 
to tested mechanical aptitudes. 


Received December 16, 1948. 


l Work Satisfaction and Work Efficiency of Vocational 
i Counselors as Related to Measured Interests * 


Salvatore G. DiMichael 
Office of Vocational Rehabilitation, Federal Security Agency 


article reports another phase of a broad study designed to obtain 
complete understanding of personnel engaged as vocational re- 
tation counselors for the civilian disabled. A previous article de- 
d the experimental study which devoted major attention to a 
rmination of the pattern of measured interests and of the relation- 
between measured and self-estimated interests for a group of coun- 
ors. It was found that the typical profile of measured interests on 
he Kuder Preference Record was sharply differentiated from the general 
ulation; that the highest median vocational interest areas were Social 
vice (98 %ile), Persuasive (82 %’ile), and Literary (65 %’ile); that the 
ility coefficients for the scales ranged from .70 to .89 with an average 
interval of 5 months between tests; that self-estimated interests 
ally correlated to a substantial degree (median r = .56) with meas- 
ed interests; and that when the counselors had previous knowledge of 
ir Kuder results, it did not change the subjective expressions of their 
tests in the direction of the objective preference scores (1). 
In the present report, the experimental investigation primarily deals 
th the possible relationships between the measured interests of voca- 


rehabilitation counselors and their work satisfaction, and work 
hether the Kuder results 


ficiency. This study sought to determine w 
üld give a basis for predicting varying degrees of work satisfaction and 
work efficiency among a selected population of counselors who were 
idy on the job when the experiment was begun. 

On the basis of a critical review of the experimental literature on the 
der Preference Record, Super states that “the evidence justifies the 
aclusion that the Kuder Preference Record has now been sufficiently 
ell standardized and validated for use in vocational guidance. AEE 
research needs to be done before the Record can be considered a 
-understood instrument, but it is already a valuable tool in the 


nselor’s kit” (6, p. 191). ; 
Evi idity of the Kuder Preference Record in terms of 
O ni according to Kuder’s 1946 manual 


ijoyment and efficiency on the job is, ; 
author gratefully acknowledges the assistance given by Donald H. Dabelstein 
initial steps of the study. 


ie 


319 


320 Salvatore G. DiMichael 


(4), found in only one study. In the latter, Hahn and Williams (3) 
reported significant differences between mean scores on the clerical scale 
for satisfied and dissatisfied workers of three clerical groups of women 
Reservists in the Marine Corps. 


Method 


While conducting orientation institutes for counselors engaged in 
the State-Federal vocational rehabilitation program for physically and 
mentally disabled civilians, the author administered the Kuder Preference 
Record to the trainees. They were assured of the confidentiality of in- 
dividual results and were requested to turn in their interest profiles for use 
in an experimental investigation about the interest patterns of rehabilita- 
tion counselors. Five months later on the average, they were requested 
to retake the Kuder and also to fill out a Survey Sheet which recorded 
their degree of interest with the job of counseling taken as a whole and 
with distinguishable phases of it. At the same time, a prepared Job 
Rating Schedule was sent to each of the counselor’s supervisors who 
were requested to rate the men for efficiency on the job as a whole and 
various phases of it. Of the initial group of 134 counselors, 10 had 
resigned in the meantime and 24 had some of the necessary records 
missing; the remaining number of 100 is referred to collectively as 
Group A. 

Group B was made up of 46 counselors who had never taken the 
Kuder inventory before. They first were requested to fill out the items 
in the Survey Sheet which included questions about work satisfaction. 
Then the Preference Record was administered. In the present study, 
data on Group B enter into the experimental results only on job satis- 
faction. No job efficiency ratings were secured on this group. 


Counselors’ Ratings on Work Satisfaction 


The counselors were asked on the Survey Sheet (graphic rating scale 
method) to rate the degree of their liking for the job as a whole and for 
particular phases of the job (9 items). 

The checked Tatings were converted to numerical scores from 0 to 20. 
The results in terms of means and standard deviations for Groups A 
and B are listed in Table 1. By a comparison of the averages of the 
counselors’ ratings on the different scales, it is possible to estimate the 
relative degrees of satisfaction with several phases of their work. The 
highest amount of work satisfaction seemed to be derived from “inter- 
viewing clients” and the “job as a whole.” Other phases of the work 
which gave high average satisfaction scores were “promoting the program 
to the public,” “contacting employers for jobs,” “reading scientific 


Work Satisfaction and Efficiency of Vocational Counselors 321 
Table 1 


ding Scientific Lit. on Rehabilitation 


perimenting with Guidance Techniques 


w> w> w> w> W> w> w> W> W> 


g Case Histories 12.7 4,04 — .23 
12.9 5.27 

ing Clerical Details 10.5 4,86 1.80 
8.9 5.07 

ibilitation Work After Business Hours 12.5 4,50 3.38 
9.4 5.29 


i 5 Conversion of ratings to numerical scores was made on basis of a scale of units 


tom 0 to 20. 

_**N for A = 100; for B = 46. 

Values required for statistical significance at 5% 
% of level of confidence = 2.61 (5, pp- 212-3). 


level of confidence = 1.98; at 


enting with guidance tech- 
“writing case histories” 
The phase of the job 


ture on rehabilitation,” and “experim 
iques.” Less enjoyment was reported from 
md doing “rehabilitation work after hours.” 
ked least was “handling clerical details.” 
_ From an inspection of the frequency distributions on the above items, 
i appeared that the converted scores on the different rating scales were 
normally distributed. All but one appeared to be considerably 
wed. The range of scores was very wide on all items except two, 
ely “whole job” and “Gnterviewing.” In the latter, the ranges were 
r levels of the scales. 

ing with work enjoyment, the differences 


average self-ratings between Groups A and B appeared slight. How- 


322 Salvatore G. DiMichael 


ever, it was necessary to test the differences statistically in order to be 
able to state definitely that they were or were not due to chance. Ac- 
cordingly, the “t” ratios were computed and the only significant differ- 
ences between the groups related to the items “enjoy the job as a whole” 
and “enjoy rehabilitation work after business hours.” These results 
signify that Group A claimed a higher degree of satisfaction than Group 
B in the job as a whole and in overtime work, and also show that the 
differences in average self-ratings on each of the other items are too small 
to be regarded as statistically significant. 


Work Satisfaction and Measured Interests 


In setting up the study, one important hypothesis to be investigated 
was that certain Kuder Preferences results could be used to predict 
greater enjoyment with various phases of the total job, as well as with 
the job as a whole. Thus, it seemed logical to expect that persons with 
higher scores in the Kuder scales which distinguished the counselors 
from the general population, namely Social Service, Persuasive, and 
Literary, probably would be more satisfied with the job of counseling 

asa whole. It also seemed logical to expect that counselors who came 
out higher on the Scientific scale of the Kuder would experience more job 
interest and satisfaction in experimenting with guidance techniques; 
and that counselors higher in the Literary scale of the Kuder would be 
more apt to enjoy writing up case histories; and that counselors who 
seored higher in the clerical scale of the Kuder would be less annoyed 
with the handling of the clerical details. 

The possible relationships between Kuder scores and work satis- 
faction were determined by computing correlation coefficients between 
variables that logically could be suspected of showing a significant degree 
of relationship. 

Because the scores on job satisfaction did not appear to be distributed 
normally, as noted aboye, it was necessary to consider the possibilities 
of curvilinear relationships between scores on the Preference Record and 
the job satisfaction scales. Parenthetically, it may be mentioned that 
the Scores on the Preference scales appeared to be normally distributed 
with the exception of the Artistic scale, in which the scores seemed to be 
skewed Positively. A scatter diagram was prepared for each of the 
paired variables shown in Table 2 for Group A. An inspection of each 
of the diagrams and of the empirical regression lines for the prediction 
of job-satisfaction scores from Kuder scores indicated no curvilinearity. 
A similar analysis on Group B, with a smaller number of cases than 
Group A, did not appear to be warranted. The statistical data are 


Work Satisfaction and Efficiency of Vocational Counselors 323 


ted in Table 2. It may be seen that six correlation coefficients 
tistically significant. 


tion for both groups were enjoyment in “contacting employers for 
bs” and the Kuder Persuasive scale. The evidence was not as clean- 


Table 2 


Relationship Between Kuder Preference Scores and Job Satisfaction as a 
Vocational Rehabilitation Counselor 


Kuder 
Preference 
Satisfaction Scales vs. Scales r Group 
Job as a Whole 16 Ae 
| Job as a Whole Pers. 12 B** 
_ Job as a Whole Soc. Ser. 13 A 
Job as a Whole Soe. Ser. .29 B 
Interviewing Clients Pers, 15 A 
' Interviewing Clients Pers. .00 B 
Interviewing Clients Soc. Ser. .06 A 
Interviewing Clients Soe. Ser. A3 B 
Promoting the Program Pers, 17 A 
_ Promoting the Program Pers. 13 B 
Contacting Employers for Jobs Pers. 28 A 
| Contacting Employers for Jobs Pers. 30 B 
Reading Scientific Literature on Rehab. Sci. —.02 A 
_ Reading Scientific Literature on Rehab. Sci. 01 B 
_ Reading Scientific Literature on Rehab. Lit. .05 A 
Reading Scientific Literature on Rehab. Lit. 04 B 
_ Experimenting with Guidance Techniques Sci. —.01 A 
Experimenting with Guidance Techniques Sci. —.03 B 
_ Writing Case Histories Sci. —.09 A 
_ Writing Case Histories Sci. —.25 B 
’ a Writing Case Histories Lit. .03 A 
Writing Case Histories Lit. 15 B 
Writing Case Histories Soc. Ser. .00 A 
Writing Case Histories Soc. Ser. —.08 B 
"Handling Clerical Details Cler. a2) A 
_ Handling Clerical Details Cler. 21 HA 
_ Rehabilitation Work After Hours Pers. 12 $ 
Rehabilitation Work After Hours Pers. 30 i 
Rehabilitation Work After Hours Soc. Ser. .00 e 
Rehabilitation Work After Hours Soc. Ser. -05 


x . . . Paes ical T :, ce 
*(N = 100.) Values of correlation coefficients required for statisti significan 
.197 at the 5 per cent level of confidence and .256 at the 1 per cent level of confidence 


r statistical significance are .291 at the 5 per cent 
1 per cent level of confidence (5, p. 212). 


324 Salvatore G. DiMichael 


cut for the other paired variables which showed a statistically significant 
correlation coefficient for one group but not the other. The variables, 
enjoyment in “interviewing clients” and the Kuder Social Service scale, 
showed a correlation coefficient for Group A which was very close to 
zero, although for Group B, the same variables showed a significant re- 
lationship beyond zero. The latter difference is difficult to explain 
satisfactorily. 


Work Efficiency and Measured Interests 


Another important phase of this study was to determine the relation- 
ships of the Kuder interest scores to the supervisory ratings on job 
efficiency. The results should indicate the possible value of the interest 
scores in predicting successful performance in the job, or in particular 
phases of the job. For example, did the high interest scores in the 
Persuasive, Literary, and Social Service scales have a relationship to 
proficiency on the job as a whole, and on such phases of the job as inter- 
viewing clients, interpreting psychological tests, using community re- 
sources, having high production and doing quality counseling? Similarly, 
did high Scientific interest scores correlate with effectiveness in experi- 
menting on and trying out new professional techniques, etc. 

At present, there are no satisfactory objective devices to evaluate 
job efficiency in vocational rehabilitation counseling. For this reason, 
the graphic rating-scale method was used, accompanied by instructions 
which sought to improve the reliability and validity of the ratings. 

The items rated were: a. counseling efficiency as a whole; b. conducting 
counseling interviews; ¢. interpreting psychological test results to the 
client; d. effective use of community resources; e. imparting occupational 
information to the client; f. writing up case reports; g. handling financial 
records for client’s rehabilitation expenses, making up the flow sheets, 
keeping field sheets up to date; h. making talks, speeches and promoting 
the program to the general public; i. making contacts with employers 
to secure job opportunities for his clients; j. reading current scientific 
articles, books, and reports on rehabilitation topics; k. experimenting on 
and trying out new techniques of counseling and guidance; 1. continued 
work after regular hours; m. production record on rehabilitations; and 
n. quality of work. 

2 The frequency distributions of the efficiency ratings as converted 
into numerical scores from 0 to 20 were inspected for indications of 
normality. The distributions generally appeared to be very peaked at 
the center of the scale, usually with shorter peaks at the guide points 
designated as “passable” and “very good” on the graphic scales. The 
Scores on each item spread over almost the entire range, and did not 


> 


J Work Satisfaction and Efficiency of Vocational Counselors 325 


markedly skewed. These indications made it necessary to con- 
r the possibility that the relationships between job-efficiency and 
der scores might be curvilinear. Accordingly, scatter diagrams were 
ed, as well as empirical regression lines for the prediction of 
cy ratings from Preference scores. Inspection of the regression 
os indicated no curvilinear relationships between the job-efficiency and 
st Kuder scores. There seemed to be no reason to assume that the 


Table 3 


elationship Between Kuder Preference Scores and Job Efficiency of Vocational 
Rehabilitation Counselors as Rated by EPT a 


Job Efficiency vs. Kuder Scale on 2nd da dad Kode on Ist Kuder 


Whole Job vs. Mech.* —.12 04 
“Whole Job vs. Comp. 08 1 
“Whole Job vs. Sci. 02 — 03 
. Pers. 01 10 


. Art. —.09 -14 
. Lit. 06 03 
hole Job vs. Mus. 07 02 
~ Whole Job vs. Soc. Ser. -4 —.02 
| Whole Job vs. Cler. 13 07 
Inte: iìewing vs. Pers. 14 16 
“Interviewing vs. Soc. Ser. 09 00 
iterpreting Tests vs. Sci. 04 06 
Use of Community Resources vs. Soc. Ser. 05 13 
I ing Occupational Information vs. Mech. 13 M4 
‘Imparting Occupational Information vs. Comp. 06 oT 
_Imparting Occupational Information vs. Sei. 14 03 
ng Occupational Information vs. Pers. 05 AT 
arting Occupational Information vs. Soc, Ser. —.09 07 
nparting Occupational Information vs. Cler. ,00 00 
Titing Case Histories vs. Sci. —.03 TAR 
ing Case Histories vs. Lit. 07 y 
Writing Case Histories vs. Soc. Ser. —.05 “on 
‘Handling Records vs. Cler. 05 y 
blicly Promoting the Program vs. Pers. 24 D 
Contacting Employers vs. Pers. A oi 
Reading Scientific Literature vs. Sci. -.07 Sa 
Reading Scientific Literature vs. Lit. 6 om 
menting with Guidance Techniques vs. Sci. 04 0 
habilitation After Hours vs. Soc. Ser. ERN nv 
oduction Record vs. Pers. err ‘03 
luction Record vs. Soc. Ser. T n 
y of Work vs. Pers. Loi ‘05 
Quality of Work vs. Soc. Ser. 
Jevel of confidence = .197; at 


Values required for statistical significance at 5% 
el of confidence = .256 (5, p. 212). 


326 Salvatore G. DiMichael 


relationships would be of a different type between the job-efficiency and 
the 2nd Kuder scores. 

The product-moment r’s found between the Preference Record scores, 
both first and second tests, and the supervisory ratings of job efficiency 
are presented in Table 3. It will be seen that correlation coefficients 
were computed only between those pairs of variables which might be 
suspected of yielding statistically significant relationships. Of the 66 
coefficients, only five were statistically significant. 

These results show that higher Kuder scores on the Persuasive scale 
tend slightly but definitely to indicate better job performance in promot- 
ing the program to the general public, and that higher scores on the 
Literary scale tend slightly but definitely to indicate greater activity in 
keeping up with the scientific literature in the field of rehabilitation. The 
evidence is not as clean-cut for the statement that there is a real relation- 
ship between job efficiency in contacting employers for jobs for handi- 
capped clients and higher scores on the Persuasive scale of the Preference 
Record. The latter variables are found to be related at the five per cent 
level of confidence when the first Kuder test score is considered, but there 
is no statistical significance when the second Kuder score is involved. 


Supervisors’ Ratings on Job Efficiency Elements 


It is interesting to study the distributions of the supervisory ratings 
on the several scales. A comparison of the averages of the efficiency 
ratings on the different items may indicate that supervisors are more 
satisfied with counselors’ performance in some respects than with others. 
Perhaps the differences in average scores roughly indicate relative 
strengths and weaknesses in counselors’ performance in civilian rehabilita- 
tion counseling at least as regarded by the supervisors. The foreword 
which accompanied the efficiency rating forms instructed the supervisors 
to rate their counselors so that the middle of the scale would be approxi- 
mately the average for all counselors. Upon this background of in- 
Src the ratings on the scales resulted in the scores presented in 

able 4. 

__ According to the average ratings of the supervisors, they were rela- 
tively well satisfied with the “quality of work” done by the rehabilitation 
counselors, This item ranked first in order of magnitude. The super- 
visors also seemed to be relatively pleased with the counselors’ efforts 
in the aspects of the job having to do with community contacts because 
the items next in order of magnitude were “use of community resources,” 
and “contacting employers for jobs.” The “production record” was 
less satisfactory than the “quality of work.” The counselors were rated 
lowest in the items, “experimenting on and trying out new professional 
techniques” and “reading scientific publications on rehabilitation topics.” 


Work Satisfaction and Efficiency of Vocational Counselors 327 


also were more unsatisfactory on “promoting the program to the 
al public” and “interpreting psychological tests.” Although the 
ical data are not presented in this article, it has been found that 
differences between means having rank orders (5) and above as 
in Table 4, as compared with means having rank orders (10) and 
Ww are statistically significant beyond the 1 per cent level of confidence. 
s signifies that in a similar sample the differences between the more 
ne means will appear again if the experiment were to be tried over 
in under the same conditions. 


Table 4 
visory Ratings on Counselors’ Job Performance in Civilian Rehabilitation Work 


Item Rated Mean* 8.D. 

Quality of Work 12.6 8.46 
Using Community Resources 12.2 3.89 
Contacting Employers for Jobs 12.1 4.29 
Interviewing Clients 12.0 3.52 
Writing Case Histories 12.0 3.72 

_ Job as a Whole 12.0 3.50 
Handling Records 11.6 3.60 

_ Production Record 11.5 4,60 
T Imparting Occupational Information 11.8 3.86 
Interpreting Tests 11.0 3.47 

_ Rehabilitation Work After Hours 10.5 4,12 
Promoting Program to Public 10.5 4.14 
Reading Scientific Publications on Rehabilitation 10.4 3.10 

_ Experimenting with Guidance Techniques 10.0 3.62 


The ratings were converted into numerical scores from 0 to 20. 


An analysis of the magnitude of the standard deviations for all the 
ems makes interesting material for a further observation. The highest 
“production record,” ‘“con- 


dard deviations appeared for the items, Kaye 
tacting employers for jobs,” “promoting the program to the public,” and 
doing “rehabilitation work after hours.” The lowest standard deviations 
peared for the items, “reading scientific publications on rehabilitation, 
ality of work,” “job as a whole,” and “interviewing clients. ee 
s may be ascribed for these differences in the magnitude of the 
dard scatter of scores. One is that the group of the highest standard 
ations relates to items more easily counted numerically, and that 
e second group above relates to more difficult qualitative Linea! 
ther words, supervisors spread out their ratings on counse lors job 
iency more on “production record” rather than “quality of wor! 
use the former is more objective. A second possible reason for the 


328 Salvatore G. DiMichael 


higher and lower magnitudes of the standard deviations is that the 
counselors are more alike in the group of items including “quality of 
work” than in the group of items including “production record.” The 
first reason is the preferred explanation for the differences in standard 
deviations. 

Summary 


Vocational Rehabilitation counselors were requested to take the 
Kuder Preference Record, to fill out a Survey Sheet which indicated their 
work satisfaction on the job, and also were rated for work efficiency by 
their supervisors. It was found that: 


1. The counselors derived a high degree of satisfaction from their job 
as a whole, and from such phases of it as interviewing clients, contacting 
employers for jobs, promoting the program to the public, experimenting 
with guidance techniques and reading scientific literature. They least 
enjoyed the handling of clerical details, overtime work, and the writing 
of case histories. 

2. Higher scores on particular Kuder scales had low but significant 
relationships to work satisfaction for only several aspects of the coun- 
selor’s job. However, the magnitude of the correlations was much too 
low for purposes of individual prediction. Higher scores on the Kuder 
Persuasive scale indicated greater enjoyment in contacting employers for 
jobs. Other evidences of significant relationship were not as consistent, 
appearing for one experimental group but not the other. 

8. Higher scores on particular Kuder scales had low but significant 
relationships to work efficiency for only several aspects of the counselor’s 
job. However, the correlation coefficients were too low for purposes of 
individual prediction. Higher scores on the Kuder Persuasive scale 
indicated greater work efficiency in promoting the program to the public; 
higher scores on the Kuder Literary scale indicated greater efficiency in 
keeping up with literature on rehabilitation. 

4. The efficiency of counselors was rated more alike in such job 
elements as quality of work, interviewing clients, keeping abreast of 
modern scientific literature, and in the job as a whole. The counselors 
were rated less alike in such aspects of job efficiency as production record, 
contacting employers for jobs, promoting the program to the public, and 
working after hours. These differences probably were due. to greater 
difficulty in rating the counselors on items depending upon qualitative 
rather than quantitative judgments of job performance. 

Received December 20, 1948. 


References 


1. DiMichael, 8. G. The professed and measured interests of vocational rehabilitation ` 
counselors. Educ. Psychol. Measmt. 1949, 9: 59-72. 


| Work Satisfaction and Efficiency of Vocational Counselors 329 


d, J. P. Fundamental statistics in psychology and education. New York: 
McGraw-Hill Book Co., Inc., 1942. 
ihn, M. E. and Williame, C. T. The measured interests of marine corps women 
“reservists. J. appl. Psychol., 1945, 29: 198-211. 7 
ider, G.F. Revised Manual, Kuder Preference Record. Chicago: Science Research 

Associates, 1946. 

ist, E. F. Statistical analysis in educational research. Boston: Houghton 

Mifflin Co., 1940. 
Iper, D. E. The Kuder Preference Record in vocational diagnosis. J. consult. 
_ Psychol., 1947, 11: 184-194. 


Certain Rorschach Response Categories and Mental Abilities 


J. R. Wittenborn 
Yale University 


It is common practice among Rorschach technicians to include, as a 
part of their personality appraisals, some remarks concerning the subjects’ 
“intelligence,” “intellectual potential,” “mental capacity,” or ‘‘intel- 
lectual efficiency.” Such evaluations are derived by the examiners 
from a variety of considerations. 

The Rorschach scoring categories most commonly used in estimating 
mental capacity or achievements are: a. the total number of responses 
(R); b. the number of whole responses, i.e., responses based on the whole 
card (W); c. the number of responses in which Human Movement is 
seen (M); and d. the form level of responses, i.e., the accuracy and 
detail with which forms are seen. 

These aspects of a Rorschach record are not employed independently 
in making appraisals. For example, the number of whole responses is 
dependent upon the accuracy with which forms are perceived. Since 
ability estimates offered by Rorschach workers make use of a wide variety 
of informal cues, it must be emphasized that it is not a purpose of the 
present investigation to determine the nature of the relationship between 
mental test scores and a mental ability estimate based upon a total 
Rorschach evaluation. 

The purpose of the present investigation is to examine the ability 
implications of certain objectively determined, quantitatively expressed 
classes of response which are unique to the Rorschach. Specifically the 
investigation is concerned with the location (i.e., the portion of the blot 
employed) and the determinant (i.e., the shading, color or projected 
movement employed in forming a response) factors; these are unique to 
the Rorschach. The content of perceptions, as well as their accuracy 
(form level), are general factors in projection and their significance is not 
unique to the Rorschach. Therefore, the content and form level of 
responses are not included in the present analysis. 

In meeting the purposes of the experiment, the analysis of data is 
conducted with respect to the following questions: 


1. What is the order of the relationships between certain Rorschach 
Tesponse scoring categories and test evidence of mental ability? Are 
they negligible relationships which permit the kind of gross distinctions 


830 


Certain Rorschach Response Categories and Mental Abilities 331 


ween ability levels that can be made from casual observations, or do 
ey provide refined distinctions comparable to those provided by mental 
ts? If the relationships are of high order, it should be generally known 
o that they can be put to extensive use. Since Rorschach responses 
“appear to be less a function of specific education experiences than the 
performances currently sampled by most mental tests, it is conceivable 
that the demonstration of high relationships could influence future 
mental test procedures. 

2. What is the nature of the relationship between various classes of 
mental ability and various Rorschach response categories? Intelligence, 
general ability, etc., are words for groups of human abilities, but the 
groups have no standard consistency. If a full appreciation of a relation- 
ship between a Rorschach response category and a mental ability is 
to be had, the nature of the mental ability in question must be specified. 
Accordingly, in the present analysis measures of verbal, spatial, and 
numerical abilities are employed. In addition, general measures of 
“scholastic ability are included. If a pattern of relationship could be 
demonstrated between certain Rorschach response categories and certain 
classes of mental ability, an improved understanding of both Rorschach 

responses and of the mental abilities in question might result. 


The Experimental Plan 


f The subjects were a heterogeneous group of 68 Yale students who 
had been in a speeded reading course or had consulted the writer. The 
Rorschach tests used in this analysis were administered and scored by 
_ Klopfer trained examiners. 
_ The ability data employed in the analysis, the results of the College 
| Entrance Examinations and the results of the Yale Freshman Aptitude 
ests, were taken from the files of the Yale University Student Appoint- 
ment Bureau. Scores for the following variables were taken from each 
t student’s file and used in the analysis: 
l I. Scholastic Ability: 1. First Semester Freshman Year grade average; 
and 2. General Scholastic Prediction for Freshman Year. 7 
= II. Verbal Ability: 1. College Entrance Scholastic Aptitude Verbal 
T test; 2. College Entrance English Essay test; and 3. Yale Verbal Reason- 
-ing test. y j 
III. Numerical Ability: 1. College Entrance Scholastic Aptitude 
Mathematical test; and 2. Yale Quantitative Reasoning test. 
= IV. Spatial Ability: 1. Yale Spatial Visualization test; and 
Mechanical Ingenuity test. 


e Using only Yale undergraduates as subjects restricts the range of 


2. Yale 


332 J. R. Wittenborn 


ability sampled.! Probably no member of the present group has a 
verbal IQ as low as 115. The range of ability sampled is less restricted 
than at first might be supposed, however; the high levels are very well 
represented. Moreover, some of the tests which are not relevant to 
general academic achievement, e.g., the measures of spatial ability, may 
include a very wide range of scores. In general it may be claimed that 
using a variety of tests which sample relatively homogeneous, specifiable 
abilities results in less range restriction than would result in using one 
general ability score, e.g., an IQ. 

There are two sets of considerations to be observed in generalizing from 
the results of the present study: a. If no significant relationships are 
found in the present sample, it is unlikely that important linear relation- 
ships would be found in a more heterogeneous sample; and b. If the 
relationships in the present sample are highly significant, it is possible 
that would have a practical predictive value in a more heterogeneous 
sam 

An answer to the two questions raised in the introduction calls for an 
examination of the relationships between each of the nine mental ability 
measures and each of eighteen Rorschach categories.? 

Since there were nine mental tests to be correlated with eighteen 
Rorschach categories (a total of 162 determinations), it was decided first 
to make the simplest preliminary examination of each possible relation- 
ship, and subsequently to make a thorough study of the promising rela- 
tionships. For this purpose the 10 highest and 10 lowest people in each 

test distribution were selected. This provided nine different 
of high and low standing students. Scores on each Rorschach 
oe were obtained for the high and low standing groups 


Analysis of Data 
Table 1 shows the average number of Rorschach responses for the ten 
students who scored highest and for the ten students who scored lowest 
each of the tests, The two measures of scholastic ability are not 
one is merely a grade average and the other is a prediction based 


restriction does not preclude the possibility that these tests show high 
in a sample of Yale undergraduates. Boan bE hk above teste have inter- 


H 


i 


are: 1. W, Whole Blot; 2. D, Large Usual Detail; 3. d, Small Usual Detail; 


4. Dd, Unusual Detail; 5. 8, White Space; 6. M, Human Movement; 7. FM, Animals 
in Action; 8. m, Abstract or Insnimate Movement; 9. k, Shading on a Three Dimen- 
— Expanse projected on a two dimensional plane; 10. K, Shading or Diffusion; 
1. FK, Shading in Three Dimensional Expanse in Vista or perspectus; 12. F, Form 
cel tly) eal lpn ge parade Seem 
matic Surface Color; te 
beota ee rede. OT Ce with Tadele! 


: 


Certain Rorschach Response Categories and Mental Abilities 383 


cant when a ¢ test was made. 


The total number of Rorschach responses was found, upon inspection, 
o be positively skewed for all groups, thus showing a ¢ test of the differ- 
ces between the means of the total number of responses to be inap- 


th logarithms of the total number of responses, 


Table 1 


2, General Scholastic Prediction 


Verbal 
1. Scholastic Aptitude Verbal 


334 J. R. Wittenborn 


between the low and high level ability groups, the findings offer little 
support for the practice of using the total number of Rorschach responses 
as an evidence for mental ability among individual college students. 

Because of the indication of a slight positive relationship between 
total number of responses and measures of mental ability, the location 
and determinant scores for each individual were expressed as a per cent 
of his total number of responses.* Both the raw scores and the per cent 
of total scores were analyzed. Despite the large number of differences 
examined, very few trends were discovered and almost none of them 
was significant. Only the promising trends will be presented in the 
following paragraphs. 

Table 2 


A Comparison Between Pairs of High and Low Scoring Groups on the Basis of Both 
the Number and the Per Cent of Human Movement Responses 


High Group Low Group Difference 

Test No. M No. M No. M WM 
I. Scholastic 

1. First Semester Average 5.3 3.6 1.7 4.5 

2. General Scholastic Prediction 79 4.6 3.3 1.9 
II. Verbal 

1. Scholastic Aptitude Verbal 8.8 4.2 4.6 4 

2. English Essay 8.2 8.1 Gt 2.6 

3. Verbal Reasoning 8.1 4.6 3.5 3.7 
III. Numerical 

1. Scholastic Aptitude Mathematical 8.3 4.3 4.0 1.8 

2. Quantitative Reasoning 8.2 3.5 4.7 a 
IV. Spatial Ability 

1. Spatial Visualization 6.5 43 2.2 4.1 

2. Mechanical Ingenuity 7.9 5.4 2.5 5.0 


Table 2 indicates the nature of the relationship between the number 
of Human Movement (M) responses and the mental ability measures. 
Tt is apparent that there is a general tendency for the number of Human 
Movement responses to be positively related with mental ability measure- 
ments. This tendency is not wholly due to the fact that mental ability 
is slightly related to total number of responses; this is indicated by the 
consistent positive differences between the groups in per cent Human 

In his study of the relationships between Belleview-Wechsler scores and Beck’s 
Rorschach scoring factors, Wishner (8) makes no adjustment for the manner in which 
some of the scoring factors may be influenced by the total number of responses ®). 


This is regrettable because his data suggest that the validity he claims for Z could 
be largely if not entirely due to R. 


Certain Rorschach Response Categories and Mental Abilities 335 
Table 3 
A Comparison Between Pairs of High and Low Scoring Groups on the Basis 
of the Per Cent of Whole Responses 
High Group Low Group Difference 

Test WW WwW %W 
I. Scholastic 

1, First Semester Average 42.3 40.2 2.1 

2. General Scholastic Prediction 34.2 42.9 =8.7 
II. Verbal 

1. Scholastic Aptitude Verbal 34.2 38.4 -4.2 

2. English Essay 33.5 27.2 6.3 

3. Verbal Reasoning 29.0 35.0 —6.0 
TI. Numerical 

1, Scholastic Aptitude Mathematical 29.5 34.9 54 

2. Quantitative Reasoning 32.5 30.5 2.0 
IV. Spatial Ability 

1, Spatial Visualization 33.0 32.8 2 

2. Mechanical Ingenuity 24.5 37.0 12.5 

Table 4 


Evidence for Relationship Between Tendency to Give Achromatic Color Responses and 
Tendency to be in the High or Low Scoring Groups for Each of the Mental Tests 


I. Scholastic 
1, First Semester Average 
2, General Scholastic Prediction 


II. Verbal 
1. Scholastic Aptitude Verbal 


2. English Essay 
3. Verbal Reasoning 


III. Numerical y 
1. Scholastic Aptitude Mathematical 


2. Quantitative Reasoning 
IV. Spatial Ability 


1. Spatial Visualization 
2. Mechanical Ingenuity 


* Without Yates correction. 


P 


o ea 


336 J. R. Wittenborn 


Movement responses. The number of Human Movement responses like 
the total number of responses proved to be positively skewed; as a conse- 
quence a ¢ test was based on the logarithms of the per cent of Human 
Movement scores. None of the differences proved to be significant at 
the five per cent level. 

The number of whole responses showed little promise of being related 
with the mental ability scores. Table 3 shows the ambiguous finding 
for per cent whole responses. 

Only one of the other scoring categories for determinant or location 
factors showed evidence of being related with mental ability. This was 
the number of achromatic color responses (C’). 

Since for any individual the number of achromatic color responses 
(C’) was small, no ¢ test was feasible and no correction for the influence 
of the total number of responses on the number of achromatic color 
responses was made. The reliability of the relationship between the 
number of achromatic color responses and mental ability scores was 
examined by means of a x? test of independence, table 4. 

Discussion 

The experimental findings are discussed with respect to the two 
questions to which the experiment is specifically relevant: 1. What is 
the order of any linear relationship between a Rorschach response 
category and test evidence for mental ability?, 2. What is the pattern 
of linear relationships between various classes of mental ability and 
various Rorschach response categories? 

With respect to the first question, it is apparent that no linear rela- 
tionship of sufficient strength to, justify individual evaluation exists be- 
tween any type of mental ability sampled and any one of the usual 
Rorschach location or determinant scoring categories. The qualification 
“linear”? is offered because it is possible that at a low level of ability a 
more appreciable relationship exists between mental ability and fre- 
quency, of responses in certain of the Rorschach categories. Such dis- 
continuous or curvilinear relationships have not proved to be important 
in mental ability studies, however. 

Because of the paucity of evidence for reliable relationships between 
mental ability and the selected Rorschach response categories, the second 
question becomes irrelevant. The slight trends observed give no hint 
that certain types of responses are correlated with certain types of ability. 

Obviously the present findings do not preclude the possibility that 
the Rorschach may be used in some manner or other to predict some 
aspect of mental ability. The present study does indicate the limited 
value of Rorschach location and determinant categories as evidence for 


Ax 


Certain Rorschach Response Categories and Mental Abilities 337 


mental ability. This suggests that the accuracy of Rorschach perceptions 
(form level ratings (4)) and other cues are the primary basis for any 
valid appraisal of mental ability. Such cues are not particularly ob- 
e; their evaluation is informal and not well standardized. The 
lity of form level ratings accrues from the consistency of the 
examiner,‘ and their validity is dependent upon his judgment. Thus 
_ it appears that the most formal and objective aspects of a Rorschach 
_ protocol (the location and determinant category scores) have almost 
validity. The remaining factors (form level and the other purely 
= qualitative cues) are likely to be unreliable or, at best, to possess a 
_ reliability which is more a characteristic of the examiner than of the 
Rorschach procedure. Concerning the possible validity of accuracy of 
perceptions as an evidence for mental ability, it is of interest that Beck's 
pC (1) F plus % (probably more reliable than Klopfer’s form level ratings) 
was found by Hertz (2) to be correlated with mental ability; Wishner 
(8) could not confirm this, however. 


Summary and Conclusions 


The present study is an examination of the relationships between 
measures of scholastic, verbal, numerical, and spatial abilities and the 
commonly used Rorschach scoring categories for location and deter- 
minant factors. The subjects were a sample of sixty-eight Yale students, 
The findings may be summarized in the following manner: 


1. Although the total number of responses, the number of whole 

responses, or the number of Human Movement responses is often used 
as a part of the procedure for estimating mental ability from Rorschach 
_ protocols, in the present sample none of them has sufficient validity to 
, to justify use for distinguishing between individual college students of 
_ different levels of ability. 
2. If the relationships between any Rorschach location or deter- 
= minant category and any of the types of mental ability used in the 
"present study is linear, the evidence from this sample indicates that their 
value for predicting individual mental ability is so scant as to make their 
use at any ability level uneconomical and misleading. 

3. The present negative or negligible findings do not preclude the 
possibility that some examiners, employing other aspects of the protocol 
or clues not gained from the os responses, may arrive at valid 
estimates of some sort of mental ability. 

4. There is evidence of a slight tendency for the total number of 
~ Rorschach responses (R) to be positively correlated with several measures 
~ of mental ability. This finding requires that all of the other comparisons 


"This was recognized by Wishner (8). 


338 J. R. Wittenborn 


had to be corrected for differences in total number of responses in order 
to eliminate the spurious effect of a third variable. 

5. Two of the Rorschach scoring categories based on the determinants 
of a response (the color, shading, or movement factors) show evidence for 
a slight positive relationship with measures of mental ability. They are 
the number of Human Movement responses and the number of achro- 
matic color responses. 

6. None of the Rorschach categories based on the location factor 
(portion of the card used in forming a response), is related with any of 
the measures for mental ability. Significant trends were absent not only 
among the skewed raw scores but among their logarithms as well. 


Received November 18, 1948. 
References 


1. Beck, S. J. Rorschach’s test: Vols. I and II. New York: Grune and Stratton, 1945. 

2. Hertz, M. R. The Rorschach Ink Blot Test: A historical summary. Psychol. Bull., 
1935, 32, 33-66. 

3. Hertz, M. R. Rorschach norms for an adolescent group, Child Develop., 1935, 6, 
69-76. 

4, Klopfer, B., and Davidson, H. H. Form level rating. Rorschach Res. Exch., 1944, 
8, 164-177. 

5, Klopfer, B., and Kelley, D. M, The Rorschach Technique. New York: World Book, 
1942. 

6. Rapaport, D. Diagnostic psychological testing: Vols. I and II. Chicago: Year Book 
Pub., 1945, 

7. Rorschach, H. Psychodiagnostics. Berne: Hans Huber, 1942. 

8. Wishner, Julius. Rorschach intellectual indicators in neurotics. Amer. J. Ortho- 
psychiatry, 1948, 18, 265-279. 


Modification of Academic Performance 
through Personal Interview * 


Alex C. Sherriffs 
University of California 


a Among the many problems facing university teachers today is that 
of the large class. Each year finds a greater proportion of university 
courses with enrollments in the hundreds. Some of the larger uni- 
versities report over a thousand students in the beginning courses of 
‘ rtain popular fields. 
Most instructors feel that the large class is an educational hazard. 
” The negative aspects perhaps most frequently cited include the minimal 
| opportunity for student participation during lecture hours, the necessity 
for using recognition type examinations which usually do not call for 
~ the integration of course material, serving only the purpose of providing 
a basis for grading students, and the essential lack of contact between 
individual students and the course instructor. 
It is with one phase of this latter aspect that this paper is concerned. 

The experiment reported here is intended to throw some light on the 
“significance of the contact of individual students with their instructor, 

especially in the situation of the large class. 
` This experiment was formulated on the basis of three hypotheses. 
These were: (1) that those students of a large class who felt themselves 
to be known as individuals to their instructor would demonstrate more 
effective learning of course material than would their fellows not so 
known; (2) that there would be demonstrable individual differences in 
” the effects of being known to the instructor; and, (3) that such individual 
differences could be predicted with some accuracy. 


Procedure 

it was decided to subject a random sample 
sixty-minute interview by their instructor 
“during the week following the first midterm examination. Scores on 
© this examination would serve as & baseline against which to compare the 
" performance of these students on examinations following the interview. 


| The remainder of the class would serve as the control group. 


*The writer is indebted to Edna ‘Adelson and to Joseph Adelson for technical 


_ assistance in this study. 


__ To test the first hypothesis, 
-of students in a large class to a 


339 


340 Alex C. Sherriffs 


To test the second and third hypotheses, judgments would be ob- 
tained on the students interviewed as to certain personality variables. 
These variables would be characteristics considered likely to modify the 
effect of an interview contact by the instructor. Students high on such 
characteristics would be compared in their performance on examinations 
taken after the time of the interviews with those low on these character- 
istics, and both of these groups would be compared with the non-inter- 
viewed students of the class. 


The Subjects 


The class chosen for this experiment was the beginning survey course 
in psychology at the University of California. This class was chosen 
simply because of its availability and its large enrollment. The experi- 
menter was the instructor, and some 257 students were registered and 
took the examinations throughout the course. 

This course is open only to those not intending to major in psychology. 
The students were all freshmen and sophomores who ranged in age from 
17 to 24, with a mean age of 19.0 for the group. 

The course extended over a sixteen-week period, with three lectures 
each week, and with objective recognition type examinations. Midterms 
were administered during the fifth and tenth weeks, and the final examina- 
tion was held at the end of the sixteenth week. All students in the 
course were required to serve as subjects for two hours of laboratory 
experiment during the semester. The interview for the subjects of the 
present investigation counted as one of the regular laboratory hours. 

A sample of thirty-four students was selected for the experimental 
group by including every eighth student on the class roll. Check 
indicated the sample to be representative of the class as a whole on the 
variables of age, sex, and academic major. Of importance was the fact 
that the distribution of scores made on the first midterm by this group 
-was highly similar to that made by the rest of the class (See Table 1). 


Table 1 
Comparison of the Experimental Group with the Remainder of the Class 
on the First Midterm Examination 
as 
Mean 8.D. t 

eee Pe ce I E 

Experimental Group 49.5 4.82 

N = 34) 
P . é 17 
Remainder of Class 49.3 6.80 


N = 223) 


Modification of Academic Performance 341 


The Interview and the Personality Variables. Since the main function 
the interview was to cause each student of the experimental group to 
that he was known to the instructor as a definite individual, and 


to make certain personality judgments, the interviews were directed 
procuring life history and attitudinal material. The instructor care- 
fully avoided discussion of material of the course or of the student’s 
_ Teaction to the class. The explanation given the student for the inter- 
view was that it was desired to know as much as possible of the interests 
d backgrounds of those enrolling in this course. 

The personality variables chosen from among those likely to have 
nificance in relation to the effect of the interview on the student’s 
_ academic performance follow: 


1. Self-tension. The amount of tension felt by the student as to his 

i own adequacy and worth. 

2. Family-tension. The amount of tension felt by the student in his 

family relations, in regard both to parents and siblings. 

3. Social-tension. The amount of tension felt by the student in his 
social relations. 

4. Over-all tension. The general level of tension and anxiety under 
which the student functions, taking into account the above 
three areas. 

5. Achievement need. The importance to the student of high aca- 
demic grades. 

6. Affection need. The importance to the student of receiving a 
constant supply of warmth and affection from others. ’ 

7. Praise need. The importance to the student of praise and recog- 

-H ‘nition from others. 

Obviously these variables would not be completely independent, but 
seemed that their individual meaning was sufficiently separate to be 
ful for this study. Intercorrelation of measures’ of variables was no 
mship was represented. The real 


andicap so long as a true relatio t h 
difficulty lay in not having independent observers to obtain the different 


measures. By the nature of the study only the course instructor could 
‘interview the group of subjects. The amount of intercorrelation Te- 
ting from “halo effect” operating on the one interviewer cannot be 
etermined. ; R 

_ Five-point rating scales with defined points were utilized for the 
idgments of the four tension variables. These were rating scales 
viously found to be useful by the writer. Seven-point rating scales, 
2 Sherriffs, A.C. The “Intuition Questionnaire”: A new projective test, J.abnorm. 
Psychol., 1948, 43, 326-337. 


342 Alex C. Sherrifs 


with the points defined in terms of probability of occurrence, were em- 
ployed for the ratings of the subjects on the three “need” variables. 
These rating scales follow closely the method outlined by Murray.? 


Results * 


1. First Midterm Examination. The first task was to assess differ- 
ences on the first midterm examination between the randomly selected 
sample and the remainder of the class. This examination, it will be 
remembered, was administered before the members of the experimental 
group were interviewed. The pertinent data for this comparison appear 
in Table 1. 

This comparison does not suggest that the experimental group was 
different from the remainder of the class in terms of performance on the 
first midterm examination. 


Table 2 


Comparison of the Experimental Group with the Remainder of the Class in Terms of 
the Shifts in Scores from the first Midterm to Subsequent Examinations 


Midterm I to Midterm II Midterm I to Final 
Mean $.D. t Mean S.D. t 
Experimental Group +6.6 5.52 +80.4 9.99 
N = 34) 
2.39* 1.88 
Remainder of Class +43 5.31 +76.3 12.08 
(N = 223) 


* Significant at the 2 per cent confidence level. 


2. Performance on Second Midterm Examination and on Final Ex- 
amination as Compared with Performance before the Interview. Suggestions 
as to the effect of the interview on the performance of the experimental 
group of subjects come from comparisons of this group with the re- 
mainder of the class in their functioning on later examinations relative 
to their functioning on the first, pre-interview, midterm. The mean 


Reva H. A. Explorations in personality. New York: Oxford University Press, 


* All estimates appearing in this paper of the significance of the differences between 
means are based on the ż test. Comparisons involving a two part split of the thirty-four 
subject experimental group require a £ of 2.04 to be significant at the 5 per cent con- 
fidence level, and a ¢ of 2.74 to be significant at the 1 per cent confidence level. Those 
comparisons which involve the 223 subjects not included in the experimental group 


require a t of 1.96 to be significant at the 5 per cent level, and a t of 2.58 to be significant 
at the 1 per cent level. 


Modification of Academic Performance 343 


differences in points scored on the first midterm examination and those 
ored on the second midterm and those scored on the final examination 
are presented in Table 2. The variabilities of the shifts in performance, 
_and the significance of the differences between the shifts of the experi- 
_ mental group and of the remainder of the class are also shown. 
À These comparisons suggest that the interviewed group of students 
improved more than did the remainder of the class in their performance 
on the second midterm examination held four weeks after their contact 
ith the instructor. The difference in improvement is significant at 
_ the 2 per cent confidence level. The difference still favors the experi- 
_ mental group at the time of the final examination some ten weeks later, 
_ but this latter difference is not significant at the 5 per cent level. 


Relationship of Rated Personality Variables 
to Effects of Interview on Performance 


Distributions of ratings were made for each of the seven personality 
variables to be studied. These distributions were then considered sepa- 
~ rately and split in each case so as most closely to accomplish a 50-50 
- division of the subjects on that particular variable. Comparisons could 
_ then be made of the examination performance of those subjects rated 
higher on each variable with the examination performance of those rated 
lower. 
1. Relationship of Rated Personality Variables to Performance on the 
_ First Midterm Examination. The means and standard deviations of the 
scores made on the first midterm examination by those rated higher and 
those lower for each personality variable are shown in Table 3. Esti- 
Mates as to the significance of the differences between these means are 
also indicated. 
Y Table 3 


{ “Performance on the first Midterm Examination of those of the Experimental Group 
Rated Higher and of those Rated Lower on Seven Personality Variables 


Higher Ratings Lower Ratings 
Variable N Mean SD. t N Mean SD. 
' 16 481 5.82 1.62 18 50.8 3.24 
| Family Tension 14 483 554 1.25 2 ` D E 
Social Tension 17 48.1 5.06 1.73 17 ee 
Overall Tension 13 47.5 4T 2.03 21 ae 
“Achievement: Need 13 50.0 4.77 A4 21 i ; 
Affection Need 10 45.3 467 3.22** 24 513 438 
12 468 4.88 2.65* 22 510 405 


* Significant at the 5 per cent confidence level. 
** Significant at the 1 per cent confidence level. 


344 Alex C. Sherriffs 


These comparisons reveal that the students rated as most tense, 
generally, and in each of the three areas of tension, did less well on the 
first midterm than did their fellow students who were rated as less tense. 
Those students judged most strongly to need affection and praise did less 
well than did those judged to have less of these needs. In the case of 
the achievement need we find that those students with higher ratings 
performed better on the examination. The differences between means 
are significant at the 5 per cent level of confidence in the cases of overall 
tension and need for praise, and at the 1 per cent level in the case of need 
for affection. 


Table 4 


Relationship to Rated Personality Variables of Shifts in Scores 
from First Midterm to Subsequent Examinations* 


Midterm I to Midterm II** Midterm I to Final** 


Higher Lower Higher Lower 

Ratings Ratings Ratings Ratings 
Variable Mean S.D. Mean S.D. Mean SD. Mean S.D. 
Self Tension 88 4.78 4.7 5.45 83.5 9.12 77.7 9,92 
Family Tension 8.9 4.79 5.0 5.43 80.9 10.30 80.1 9.72 
Social Tension 83 4.93 49 5.59 82.0 9.32 78.8 10.36 
Overall Tension 87 437 5.3 5.77 81.1 10.01 80.0 9.94 
Achievement Need 66 3.81 67 6.35 83.8 10.56 78.3 8.97 
Affection Need 10.8 4.12 4.9 5.08 83.2 8.40 79.3 10.35 
Praise Need 88 4.76 5.5 5.56 84.3 8.76 78.3 9.99 


5 ee N’s for the subgroups of subjects represented in this table may be found in 
‘able 3. 


** All shifts are positive for in all cases means are higher on the second midterm and 
on the final examination than on the first midterm. 


The implications of these findings would seem to be that degree of 
tension and amount of need for affection and for praise are related to 
examination performance by students in large classes. One might hazard 
guesses as to the further meaning of these data, for example, the relation 
of these tensions and these needs to academic performance generally, 
regardless of class size, and the deeper meaning of the presence of high and 
low tension and strong and weak needs in regard to personality structure 
and function. 

2. Relationship of Rated Personality Variables to Performance Occurring 
after Contact between Student and Instructor. It was necessary to find a 

_ measure of the effect on performance of contact with the instructor, at 
the same time relating this effect to the seven personality variables being 


Modification of Academic Performance 345 


investigated. The sole use of scores on the second midterm examination 
and on the final examination would be inadequate because of the findings 
presented in the previous section. Such scores would be ambiguous in 
meaning for our purposes because of the fact that the personality varia- 
les were shown to be related directly to performance. This relation- 
ship would somehow have to be taken into account before the effect of 
e instructor contact could be isolated. 

The measure best serving the purposes of this study was that of the 
_ differences in performance before and after contact with the instructor 

as related to ratings on the personality variables. Data on such shifts 
“in mean performance from the first midterm examination to the second 
“midterm examination and to the final examination will therefore be 


Table 5 


Significance of the Difference in Scores on Examinations Taken Before and After 
Interview Contact with Course Instructor 


Midterm I to Midterm II Midterm I to Final 


High on Highon Lowon Highon High on Low on 
Variable Variable Variable Variable Variable Variable 
vs, Low vs. Class vs.Class  vs.Low vs. Class vs. Class 


t t t t t t 
42.21% 43.27% 4.35 41.72 +233" + 47 
family Tension 4211* 43.20% +159 + .24 +139 +1.34 
4180 , +301" +.50 + .91 +190 + .88 
Tension 4175 +293 + .87 +30 +130 +185 
Achievement Need = 02 +157 +1% +1.59 42.19% + 73 
Affection Need 4316 +392 + 54 4104 +1.78 +115 
‘Praise Need 41.68 +285 +100 +167 +224" + .75 


* Significant at the 5 per cent confidence level. 
** Significant at the 1 per cent confidence level. 


In Table 4 the means and standard deviations of the differences in 
interview contact with the 


“scores on examinations taken before and after i y 
instructor are presented. The experimental group is broken down into 
‘those students rated higher and those rated lower on each personality 
variable. peril ie 

j The significance of the differences in shifts in examination sooren 
between: (1) those students high on each variable and those eh 
those high on each variable and the remainder of the class; and (3) those 


low on each variable and the remainder of the class, were then calculated. 
‘Table 5 summarizes the resulting ?’s. 


346 Alex C. Sherriffs 


The data summarized in Tables 4 and 5 suggest that: 


1. The effect of a single interview contact by the individual students 
of the experimental group with their course instructor was not uniform. 
There was significantly (at the 1 per cent level) more effect on those 
students rated higher on self tension, family tension, social tension, over- 
all tension, affection need, and praise need than on those rated lower— 
when one compares the performance of these students with that of the 
non-interviewed students of the class. 

2. The effect of the interview contact diminishes over the ten-week 
period before the final examination, holding up (at the 5 per cent level) 
for only three out of the seven variables, and only then in the case of 
those subjects judged relatively high on these variables. Nonetheless, 
comparison of the scores of the subgroups of the interviewed subjects 
shows them all to have higher mean scores than the mean score attained 
by the remainder of the class as a whole. 


The limitations of this study in terms of numbers of subjects in the 
experimental subgroups, lack of controls over the possibility of the opera- 
tion of “halo effect” on the personality ratings, and the fact that the data 
obtained are from one class at one university and in relationship to one 
instructor do not allow for definite generalizations to students, classes, 
and instructors the world over. However, the writer feels the results of 
this study to be evidence for the value of personal interviews with students 
in large classes. These results further suggest that some students are 
handicapped in their performance by the lack of student-teacher contact 
and the lack of individualization felt when an “unknown” member of a 
class. This study points to the possibility of discovering those students 
who need most and who would profit most from individual attention. 
Conversely, it indicates the possibility of screening those students who 
would be handicapped but little by membership in a large class insofar 
as lack of contact with the instructor is concerned. It is of particular 
interest to the writer that the significant improvement in examination 
performance made by students following the interview with their in- 
structor was made after a single contact, and a contact of only one hour. 
The results of a study on the effects of continued conferences might 
truly be exciting, 

Received November 4, 1948. 


Vocabulary Item Difficulty and Word Frequency 


James J. Kirkpatrick and Edward E. Cureton 
University of Tennessee 


In constructing a vocabulary test, it is desirable in many cases to 
_ arrange the items of the experimental edition in approximate order of dif- 
ficulty. Test constructors often try to do this by arranging them in the 
order of frequency of occurrence of the key words. Questions immedi- 
_ ately arise concerning the validity of this procedure, and the possibility of 
_ improving it by the use of direct judgments. A study designed to throw 
some light on these matters was made, using the 100 four-choice vocabu- 
lary items of the Army General Classification Test, Forms la and lb. The 
culties of these items, as reported by the Staff, Personnel Research 
_ Section, The Adjutant General’s Office (4), are given in terms of the 
" percentages of correct responses made by the soldiers in the experimental 
| tryout samples. The Form la sample included 400 cases; the Form 1b 

sample, 218. The difficulty values of the Form lb items were adjusted to 
_ make them comparable to those of Form la. The frequency of the key 
word (the stem-word of the item or the correct answer, whichever was 
least frequent?) was taken as the frequency value. The frequencies were 
"taken from The Teachers Word Book of 30,000 Words (6). This word 
"book reports the frequencies of words in terms of the number of occur- 
“ences per million running words, for the 19,440 words which are en- 
countered at least once per million; and in number of occurrences per 
eighteen million running words, for those which are encountered less 
frequently than once per million, but more frequently than once per four 
‘million. The 952 words which appear 50 to 99 times per million are 
_ lumped together and simply labeled A; the 1,069 words which appear 
~ 100 or more times per million are all labeled AA. 

The number of different words at each frequency-of-occurrence level 
"forms a J-shaped distribution, which can be fairly well represented by an 
j exponential function. In order to obtain a more or less symmetrical dis- 
‘tribution of frequency measures, the common logarithms of the fre- 

quencies of occurrence were grouped into equal intervals. The interval- 
width was determined by the fact that all words in the A-group in the 
word book (50-99 per million) had to go into one group. For the less 
Bsr a i of Form 1a, and in one of the 50 items of Form 1b, the correct 
Baie ons P EIT of occurrence than the stem-word, according to the word 


-boo 
g 347 


348 James J. Kirkpatrick and Edward E. Cureton 


frequent words, the numbers of occurrences per eighteen million were 
divided by eighteen, and the quotients were rounded off to one decimal. 
This procedure gave eleven groups containing the following frequencies: 


Range of juencies Number of Number of Words 
Group t per Mage Words in Group in ACCT la and 1b 

1 100+ (AA) ~- 1069 2 

50-99 (A) 952 5 

3 26-49 1256 3 

4 13-25 1865 18 

5 7-12 2506 10 

6 46 2638 21 

7 23 3945 15 

8 8-1 bie 13 

9 AAT bod 10 

10 2-3 he 2 

11 not in list sag 1 

** Not reported in word book. 


We were also interested in the possibility of improving on these 
frequency-estimates of difficulty by the use of direct judgment (not, in 
this case, in testing the validity of direct judgment per se). Each of 
the 100 items was therefore typed on a 3” by 5” card, and the frequency 
group recorded on the face of the card. Each judge was presented with 
the eleven groups of cards, informed concerning the basis of the grouping, 
and requested to rearrange the cards among the eleven groups so that 
they would be more nearly in the order of their “true difficulty” (defined 
as the probability that the average American soldier in World War II 
would get the right answer). They were required to keep to exactly 
eleven groups, but were not required to keep the same number of cards in 
each group as the number given by the frequency-count. Judgments 
were secured from five English instructors, each of whom worked in- 
dependently? The sum of the five group-allocations was computed 
for each item. ieee sums ranged from the minimum possible, 5, to the 
maximum possible, 55, the lar, i ater judged 
difficulties. , 99, ger numbers representing gre: judg 

A third estimate of difficulty consisted simply of a count of the 
number of syllables in the key word of each item (see Flesch, 2). These 
numbers ranged from one to five. 

The validities of the three methods of estimating item difficulties 
were determined by correlating these estimates with the criterion given 

2 We are indebted to the following members English ment of the Uni- 
versity of Tennessee for making K difficulty al eae Soke yee Robert L. 
Hickey, Alice E. Johnson, Clarence P. Lee, and Elizabeth G. Morris. 


Re oe ee eer? ™ 


Vocabulary Item Difficulty and Word Frequency 349 


the Army study. The correlations are as follows: 


Criterion with frequency AT 
Criterion with judgment, 71 
Criterion with syllable-count* 20 
Frequency with judgment 81 


*Sheppard’s correction applied to standard deviation of syllable-count. 


_ The last correlation reported above is not a validity coefficient, and 
) it is spuriously high because the judges knew the frequency groups to 
begin with. It was computed because it was needed in testing the signifi- 
-cance of the difference between the first two criterion correlations. 
> Inspection of these correlations immediately suggests the marked 
periority of the frequency-plus-judgment technique, and the equally 
rked inferiority of the syllable-count technique. It seems reasonable 
" to suggest, on the basis of this latter finding, that the authors of “read- 
biblity” formulas investigate the relative merits of counting syllables 
“as against having a single judge estimate word-difficulties. Since five 
judges participated in this study, and since they started with knowledge 
_ of the frequency-counts, the outcome of such studies cannot, of course, 
be predicted. 
j The significance of the difference between the correlations of frequency 
and judgment with the criterion was evaluated by Hotelling’s adaptation 
of Student’s t-test (3). The value of ¢ was 5.5, which is clearly significant 
at the .001 level. 
_ Applying the Fisher z-transformation to the correlation of .71 between 
difficulty and frequency-plus-judgment, it was found that the chances 
19 to 1 that its “true” value lies between .60 and .80. 
A second study, concerned with difficulty and frequency only, was 
” based on a set of items consisting of word-pairs to be marked §, 0, or N, 
' depending on whether their meanings were the same, the opposite, or 
neither. Three hundred such items were administered to about 500 high 
“school seniors. On the basis of total scores on the 300 items, the top 
~ 100 and the bottom 100 examinees were selected as criterion groups, and 
68 items were discarded on the basis of failure to discriminate between 
The difficulty of each of the remaining 232 items was 
| taken as the per cent correct in the combined group of 200. The fre- 
| quency value was taken as the ordinal thousand, in The meee I erg 
ook of 20,000 Words (5), of the least frequently occurring word in : e 
"pair. For this set of 232 items, the correlation between difficulty and fre- 


fuency was .56. ; i 
Das rted low correlations between item difficulty and 


Davis (1) has repo: 
" stem-word frequency, as given by The Teachers Word Book of 20,000 


these two groups. 


350 James J. Kirkpatrick and Edward E. Cureton 


Words (5), for three types of vocabulary items from the Cooperative 
English Test. Two of these item types require the examinee to supply 
a word which matches a given definition. The factor-analysis literature 
(7, e.g.) suggests that such items measure verbal fluency” to some con- 
siderable degree, whereas items of the types reported in our own studies 
measure mainly “verbal relations.” The third item type studied by 
Davis was apparently more like those of the Army General Classification 
Test: a stem word followed by five alternatives from among which the 
examinee was required to select the synonym of the stem word. Using 
208 items of this type, Davis found a correlation between difficulty and 
frequency of only .10. 

It is quite possible that the superficial similarity between the Cooper- 
ative Vocabulary Test items and those of the Army General Classification 
Test is considerably greater than their actual content similarity. The 
Cooperative items were designed to measure precision of knowledge of 
fairly common words. Davis criticizes the practice of including rare 
words to provide difficult items in vocabulary tests. He says (1, pp. 
71-2), “The difficulty of a multiple-choice vocabulary item for a given 
group of subjects is dependent on two main factors: first, the per cent 
of the group that could define the word correctly if asked to state its 
meaning; and, second, the degree of discrimination required to distinguish 
between the correct answer and the incorrect answers, or decoys, in the 
item. The importance of this second point has often been overlooked 
with unfortunate results. Test constructors have built items to test for 
knowledge of words like ‘syzygy’ or ‘umbel.’ Such words have virtually 
no practical value except to specialists in certain learned professions; 
hence, they reduce the real validity of general vocabulary tests, but they 
have been included to provide very difficult items in vocabulary tests 
that are not made up of items in which the decoys have been chosen with 
care and ingenuity so that they differ only slightly, though incontestably, 
from the correct meanings of the words being tested.” 

The force of this argument would appear to depend on the purpose 
for which the test is designed. We can see no objection to designing 
vocabulary tests to measure range of word knowledge at a low level of 
discrimination, as well as precision of knowledge of fairly common words. 
The same-opposite-neither test is clearly of the former type. The 
Cooperative Vocabulary Test is of the latter type. The vocabulary items 
of the Army General Classification Test fall somewhere between these 
two extremes. Examination of its item-alternatives suggests that it is 
probably more nearly a range test than a precision test. 

Comparing the three correlations between frequency and difficulty, 
there appears to be a fairly definite trend. For the precision-type Co- 


Vocabulary Item Difficulty and Word Frequency 351 


operative Vocabulary Test the correlation is .10. For the vocabulary 
items of the Army General Classification Test, it is 47. For the same- 
opposite-neither test, it is .56. It seems reasonable to suggest, as a 
__ hypothesis if not as a conclusion, that the nearer a vocabulary test comes 
being a measure of range rather than precision of word knowledge, the 
__ higher will be the correlation between the frequency values of its key 
words and the difficulties of its items. Moreover, the estimates of 
_ difficulty based on frequency can be improved markedly by the use of 
_ direct judgment. 
| Received May 5, 1949. 
Early publication. 


References 
, Davis, F. B. The interpretation of frequency ratings obtained from “The Teachers 
+ Word Book.” J. educ. Psychol., 1944, 35, 169-174. 


Flesch, R. A new readability yardstick. J. applied Psychol., 1948, 32, 221-233. 

Hotelling, H. The selection of variates for use in prediction with some comments 

on the problem of nuisance parameters. Annals of Math. Statist., 1940, 11, 3, 

i 271-283. 

_ 4, Staff, Personnel Research Section, The Adjutant General’s Office, The Army General 
$ Classification Test, with special reference to the construction and standardization 

A of Forms la and 1b. J. educ. Psychol., 1947, 38, 385-420. 

5. Thorndike, E. L. The teacher’s word book of 20,000 words. Bureau of Publications, 

E Teachers College, Columbia University, 1931. 

6. Thorndike, E. L., and Lorge, I. T'he teacher’s word book of 30,000 words. Bureau 

of Publications, Teachers College, Columbia University, 1944. 

Thurstone, L. L. Primary and mental abilities. Psychometric Monograph No. 1, 

University of Chicago Press, 1938. 


Influence of Prestige Suggestion on the Answers of 
a Personality Inventory * 


Joseph F. Donceel, Benjamin S. Alimena and Catherine M. Birch 
Fordham University 


The following investigation was inspired by an experiment performed 
in 1933 by two German psychologists, H. Krüger and K. Zietz (1). They 
composed an artificial personality description and told each of 39 subjects 
that this description was based on a graphological analysis of their hand- 
writing and on a study of their horoscope. All the subjects accepted 
this one standard description as a good analysis of their personality; 

_ some were amazed at its accuracy; not a single subject rejected the 
diagnosis as a whole. 

‘Among the possible explanations of this surprising result, the authors 
noted: the fact that the subjects do not know their own personality struc- 
ture; their suggestibility; the vague and ambiguous character of many of 
the statements used in the personality description. 

The purpose of the present experiment was to find out to what extent 
subjects would accept as applying to them a personality description ob- 
tained by mere chance, even when the statements used in this description 
‘were not vague and ambiguous, and even when no effort had been made to 
avoid the contradictions which derive from a random accumulation of 
statements. 

First Experiment, Using Mild Suggestion. The subjects for the first 
experiment were 34 students in a psychology class for adults, both men 
and women, ranging in age from 20 to 55 and in education from four years 
completed in High School to two years completed in College. The 
subjects were asked to hand in a specimen of their hand-writing, and 
they were told that the experimenter would have it analyzed by a graphol- 
ogist, and would give them a detailed description of their personality, 
based on this analysis. 

In fact, the experimenter just took for each subject a Bernreuter 
Personality Inventory and answered its 125 questions at random. The 
questions were matched with the 125 first figures of a table of random 
numbers; when the figure for a certain question was even, that question 
received a “yes” answer; when the figure was odd, that question received 2 
WP atic eu ya fis iy = 12th International Congress of Psychology, Edinburgh 

352 


Prestige Suggestion and Answers of Personality Inventory 353 


By í 

“no” answer. A week after the handwriting samples had been received 
the Bernreuter Inventories were given to the subjects with the affirmative 
or negative answers, and the subjects were asked to check each of the 
statements, and to indicate whether they agreed or disagreed with the 
answer. 

From chance alone we expect a number of agreements averaging 50 
per cent, that is, an average agreement with 62.5 of the 125 suggested 
< answers. Any number of agreements higher than 73 would occur by 
chance alone only 5 times out of 100, whereas a number of agreements 
_ higher than 77 would occur by chance alone only once in 100 times. 

. The number of agreements of the 34 subjects ranged from 60 to 100. 
_ The average number of agreements was 78 with a standard deviation of 
; 941. The results of 4 subjects excluded the null hypothesis at the 5 per 
cent level of confidence, whereas the results of 15 more excluded the null 
_ hypothesis at the 1 per cent level of confidence. In other words, 19 out 
of our 34 subjects agreed with the suggested statements more often than 
could be explained by chance alone. They gave evident signs of sug- 
gestibility in their self-analysis. 

Second Experiment, Using Stronger Suggestion. The second experi- 
ent employed stronger suggestion. Fifty subjects were used, 25 men 
nd 25 women, ranging in age from 18 to 33 and in education from two 
ears completed in High School to completion of Graduate Studies, 


given individually a Rorschach Inkblot Test and a Murray Thematic 
Apperception Test. Next, allegedly on the basis of these tests, the experi- 
enter answered orally, in the presence of the subject, the 125 questions 
a Bernreuter Inventory (that is, gave for each question the answer 
termined by the dice) and asked the subject to tell whether or not he 
or she agreed with that answer. 

_ From chance alone we expect an average number of 62.5 agreements. 
‘The actual number of agreements ranged from 83 to 125; the average 
was 111.6 with a standard deviation of 9.16. Since chance alone is 
cluded at the 1 per cent level of confidence for any number of agree- 
ents higher than 77, the null hypothesis was excluded for every one of 

50 subjects. 

There was no reliable difference between the amounts of agreements 
“shown by the men and by the women. For the men the average was 
112.1 and for the women 111.0. j 

; i Every question of the Inventory was answered for the 50 subjects. 
herefore, from chance alone, it is expected that 25 subjects will agree 
ith the suggested answer to each question. Any number of agreements 
a single question higher than 34 is significant at the 1 per cent level. 


354 J. F. Donceel, B. S. Alimena and C. M. Birch 


We obtained an average agreement per question of 44,6 with a standard 
deviation of 3.44 and a range of 35-50. Hence, for each single question, 
effective suggestion could be established at the 1 per cent level of con- 
fidence. 

If we consider each question individually for the men alone, we find 
two questions for which the number of agreements is only 17 out of 25. 
Here chance alone cannot be excluded, even at the 5 per cent level of 
confidence. These questions are: “Do you ever complain to the waiter 
when you are served inferior or poorly prepared food?” and “Have you 
been the recognized leader of a group within the five last years?” 

The same applies for three questions of the female group: “Do you 
frequently argue over prices with tradesmen or junkmen?” (For this 
question suggestion did not work at all, the percentage of agreements 
was only 52); “Do people ever come to you for advice?” and “Are you 
systematic in caring for your personal property?” It will be noticed that 
these five questions are of a clearly factual nature. 

Immediately after the test with suggestion, the experimenter ex- 
plained to the subjects that the answers which had been presented had 
been obtained by a mere chance procedure and were therefore without 
value. He gave each subject an unanswered Bernreuter Inventory and 
asked him or her to answer all the questions personally. This would 
yield a measure of the endurance of the suggestion. 

Instead of finding the expected average of 62.5 agreements with the 
previously suggested answers, we found an average of 87.4 agreements, 
with a standard deviation of 10.2 and a range of 67-109. In 40 out of the 
50 subjects suggestion could still be established at the 1 per cent level of 
confidence. Only 6 subjects were able to shake off the suggestion enough 
to yield results insignificant even at the 5 per cent level of confidence. 

Third Experiment, with Suggested Reversal. In our last experiment 
the subjects were 49 sophomores attending a Liberal Arts College for 
Women. This time the subjects first answered the questions of a 
Bernreuter Inventory in the ordinary way, without any suggestion. 
. Then they were given a group Rorschach Test. Four weeks later the 
experimenter met each subject individually and told her that, for a cetrain 
number of answers, the results of the Rorschach Test contradicted the 
answers given by the subject. She was asked whether she did not feel 
that the answer suggested by the Rorschach Test corresponded better 
with reality. In other words, in this experiment we did more than to 
suggest a certain answer to the subject; we tried by means of suggestion 
to make her repudiate or reverse a previously given answer and accept 
the opposite answer as the true one. 

We did not try, of course, to make the subjects change all the pre- 


é j We? | ae 


Prestige Suggestion and Answers of Personality Inventory 365 


given answers; they would have suspected some trick. Reversal 
attempted under suggestion for one third approximately of the 
, for 42 answers taken at random, Of these 42 suggestions, an 
e of 26 was accepted, or approximately 60 per cent. There were, 
se, considerable individual differences; the lowest number of ac- 
pted reversals was 10 per cent, the highest number was 94 per cent, 

These results are highly significant. It is true that Lents (2) found 
when subjects were retested with the Bernreuter after a lapse of 
one to four weeks, they changed approximately 20 per cent of their 
nal answers. If we take that amount of change as a measure of 
he modifications which may be due to the mere lapse of time, we find 
hat the 60 per cent of change discovered in our experiment yields a chi 
e of 39.67, which is considerably above the 6.64 required for signifi- 
e at the 1 per cent level of confidence, 


Summary 
` 1, The questions of a Bernreuter Personality Inventory were answered 
& group of subjects. These answers, obtained by mere chance, were 
ented as the results of psychological tests, and the subjects were 
to tell whether they agreed or disagreed with these answers. 
2. When mild suggestion was used, 19 out of 34 subjecta accepted the 
$ ers more often than could be explained by chance alone. 

_ 8. When stronger suggestion was used, 50 out of 50 subjects yielded to 
‘Suggestion, 

4, Subjects were also induced, under suggestion, to repudiate 60 per 
of their own answers to the Inventory and to accept as true the 


d January 4, 1949. 


References 


Krüger, H, and Zietz, K, Das Verifikationsproblem. Z. angew, Paychol, 1033, 45, 
140-171. 

. Lentz, T, F. Reliability of opinionaire technique studied intensively by the retest 

method. J. soc. Psychol., 1941, 14, 220-256, 


“i 


A Note on Kahn and Hadley’s 
“Factors Related to Life Insurance Selling” 


S. Rains Wallace, Jr. 
Life Insurance Agency Management Association, Hartford, Conn. 


In a recent article in this Journal, Kahn and Hadley (1) have re- 
ported a study in which it was proposed: “first, to determine the degree 
of relationship that exists between relative success in the early period 
of selling life insurance and success at a later period; second, to examine 
various selling activities with a view to uncovering certain factors which 
differentiate successful from unsuccessful agents... ; third, to in- 
vestigate further certain personal history items and personality traits 
already known to correlate with success in selling life insurance, and to 
analyze other measurable areas of personality, with the aim of increasing 
the sensitivity of existing selection methods.” The authors further assert 
that “The identification of individuals for whom the likelihood of success is 
known would not only benefit management, but would, to some extent, 
minimize feelings of frustration on the part of the agent who, from the 
outset, may be doomed to failure.” 

The writer is in full accord with these aims (if dubious of “personality 
traits known to correlate with insurance success” and “measurable areas 
of personality”). However, he also believes that the sample of in- 
surance salesmen employed in this study was singularly ill-chosen and 
has characteristics which serve to vitiate a number of the study’s con- 
clusions. Considerable work has been done in this field (2, 3, 4, 5, 6) and 
more is in progress. It is therefore important that major findings not 
be obscured by conclusions drawn from fragmentary and inadequate data. 

Kahn and Hadley studied 84 “new life insurance agents” who had 
attended the Purdue, Course in Life Insurance Marketing. It is implied 
that these men were a random group of individuals who had just entered 
the life insurance business. This, unfortunately, is not true. Many of 
these men had sold insurance before coming to the school. Furthermore, 
there is reason to believe that the companies and agency managers in- 
volved tended to send to the school those men whom they regarded as 
most promising. This seems probable in light of the fact that many of 
the men were subsidized to some degree by companies or managers during 
the course. Certainly, there is little evidence that the group is in any 
sense representative of new life insurance salesmen in general. 


356 


Note on “Factors Related to Life Insurance Selling” 357 


The authors state that the salesmen represented 19 life insurance 
_ companies. What they neglect to mention is that life insurance com- 
_ panies are not homogeneous with respect to agents’ production. One 
_ study (5) has shown that, among 11 large insurance companies in Canada, 

the companies’ median average monthly production of agents who 

survived 12 months ranged from $5,500 to $13,700 in the first year, 
_ diven among 7 United States companies of equivalent size, an analysis 
_ of variance shows that the first-year sales production of agents who 
survive for 12 months is heterogeneous at the 1% level. 
In short, the sample employed is not relevant to the problems as 
stated, is curtailed to an unknown degree, and the criterion of success 
: (sales for the duration of the school term) is contaminated by unrecog- 
nized and, with the number of salesmen involved, undetectable company 
differences. 

Most of the conclusions listed by the authors are therefore question- 
able. It is stated that the correlation between sales during the first 13 
weeks of selling and second period of 13 or more weeks is +.55. The 
statement should read during the first 13 weeks of selling afler entrance 
in a school. It should be qualified by noting that the distribution is 
curtailed and that the correlation has probably been increased spuriously 
because of the effect of company differences. 

The curtailment involved in the selection of the sample must also be 
considered in interpreting the statement that only one of the four personal 
_ history items investigated differentiates significantly between successful 
and unsuccessful life insurance salesmen. If the authors had employed 
More cases drawn from a sample of truly “new” agents and avoided 
widespread criterion categories, they would have found that age at entry 
has a significant, but curvilinear, relation to a success criterion (5, 6) 
and that minimum monthly income required has a similarly curvilinear 
and significant relation (6). They would also have found that agents 
With no dependents are significantly inferior to others in their first-year 
performance (6). 
4 The conclusions concerning the differentiation of the criterion groups 
by various test items and total test scores is, of course, open to the same 
criticisms. Furthermore, the implication that this work is of value in the 
“identification of individuals for whom the likelihood of success is 
known” and, therefore, in the selection of life insurance agents, becomes 
highly suspect if it is remembered that many of these individuals were 
__ tested when their life insurance careers were well under way. The fact 
of success or failure may be a powerful determiner of test responses. 
z The problems of sampling, of restriction of range, and of criterion con- 
tamination are as real in investigations of the salesman as in any other. 


358 S. Rains Wallace, Jr. 


Studies in which these problems are unrecognized or ignored can serve 
only to introduce further inaccuracies into an already confused field. 


Received April 28, 1949. 
Early publication. 


References 


1, Kahn, D. F., and Hadley, J. M. Factors related to life insurance selling. J. appl. 
Psychol., 1949, 33, 182-140. 

2. Life Insurance Agency Management Association. 23800 recruits a year later. Hart- 
ford, Conn.: Life Insurance Agency Management Assoc., 1948, pp. 33. 

3, ——. Financing, survival, and production. Hartford, Conn.: Life Insurance Agency 
Management Assoc., 1949, pp. 12. 

4. —. New agent characteristics. Hartford, Conn.: Life Insurance Agency Manage- 
ment Assoc., 1949, pp. 12. 

5. ——. Canadian recruiting and results. Hartford, Conn.: Life Insurance Agency 
Management Assoc., 1949, pp. 57. 

6. ——. Recruiting results. Hartford, Conn.: Life Insurance Agency Management 
Assoc., 1949, pp. 57. 


q 


A Comment on Wallace’s Note on 
“Factors Related to Life Insurance Selling” 


J. M. Hadley and D. F. Kahn 
Division of Education and Applied Psychology, Purdue University 


It would appear that Wallace (9) is quite concerned that readers will 
misinterpret a recent article by Kahn and Hadley (3). . Careful examina- 
tion of the article in question will reveal that no generalizations are 
offered. The opening paragraph of the section entitled “Summary and 
Findings” on page 138 reads as follows: “Based solely on the criterion of 
written business, and pertaining only to those particular. life insurance 
salesmen investigated in this study, the following conclusions may be 
drawn.” All references in this section are to differences which “were 
found” to exist within the group of salesmen studied. No predictions 
were made concerning results which might be obtained from other 
samples. It is difficult for the writers to understand how “inaccuracies 
can be introduced into an already confused field” if research reports are 
read objectively and unintended generalizations are not inferred from 
admittedly “fragmentary and inadequate data,” 
Several of Wallace’s points will be considered separately: 


1. Wallace believes that “the sample of insurance salesmen employed 
in this study was singularly ill-chosen and has characteristics which serve 
to vitiate a number of the study’s conclusions”. The sample may be 
inadequate in many ways. It would be excellent if an entirely unselected 
sample could be obtained. It is doubted if such an entirely unselected 
sample was ever studied. The samples of recruits considered in the 
excellent studies by the Life Insurance Agency Management Association 
(4, 5, 6, 7, 8) are undoubtedly more adequate than those studied by Kahn 
and Hadley. Certainly, interest in being recruited also biases the samples 
studied by the Association to an unidentified degree. However, it is 
maintained that inadequacies inherent in the sample do not vitiate the 
conclusions concerning differences within the group. 

2. Wallace criticizes the designation of the subjects of the study as 
“new life insurance agents.” He also states that “many of these indi- 
viduals were tested when their life insurance courses were well under 
way.” A careful recheck of the data indicates that 95 per cent of the 
sample had not sold insurance before coming to the Purdue Life Insurance 
Marketing School. Actually, only four of the original 84 beginning 

359 


360 J. M. Hadley and D. F. Kahn 


students had ever sold insurance. Two subjects had sold, or attempted to 
sell for longer than three months: one for nine months, and one for two 
years. The experimenters did not intend to include any subjects re- 
porting more than three months’ experience. Apparently two subjects 
were included by error. Shortly after the data were collected, the school 
began to require a minimum amount of experience. This was not true 
at the time the study was conducted. Furthermore, the original intent 
was to gather information of value to the school. In line with that 
purpose, it is submitted that the best sample would be classes of students 
in that school. Consequently, the new agents in classes I and II were 
selected. It is agreed that the sample is pre-selected by their companies 
and agency managers. The subjects may not be “in any sense repre- 
sentative of new life insurance salesmen in general” but they are repre- 
sentative of the first two classes attending the school. Again it is em- 
phasized that the conclusions are limited to this group. 

3. It is unfortunate that Kahn and Hadley in their condensed pub- 
lished article neglected to recognize the lack of homogeneity between 
life insurance companies and other complexities of the problem. Kahn 
(1, 2) in the original thesis has discussed the complexity of the problem 
at length. 

4. Wallace states “most of the conclusions listed by the authors are 
therefore questionable.” For some of the reasons which he states gen- 
eralizations would be questionable, but one cannot question conclusions 
and results of a specific research study without questioning the integrity 
of the research workers, The writers accept the suggestion that the 
statement on page 135, line 7, of the results should read, «|, during the 
first 13 weeks of selling after entrance in the school.” It should be noted 
that the word “a” as suggested by Wallace has been changed to “the” 
by the writers. It would be interesting from the standpoint of re- 
search methodology to discover whether the effect of curtailment on 
the distribution and the effect of company differences, as discussed by 
Wallace, would increase or decrease the size of the reported coefficient 
of correlation. 

5. Wallace seems to be disturbed that Kahn and Hadley did not 
obtain the same results as were obtained in several studies which he has 
reported. With a larger sample it is entirely possible that many differ- 
ences would have been found to be more highly significant. On page 
136 it is reported that age at entry, number of dependents, and minimum 
living expenses per month showed a positive relationship to the criterion. 
Dichotomies made in the range of each of the above-mentioned personal 
history items offer a means of showing the relationships between these 
measures and the criterion. Thirty-one agents of 30 years of age and 


Comment on “Factors Related to Life Insurance Selling” 361 


above averaged a mean weekly production of $5905 as contrasted with 
an average mean weekly production of $4612 for the 47 agents at age 29 
or below. A similar average for 32 agents claiming two or more de- 
pendents was $6303 per week in comparison with $4307 per week for 
46 agents claiming one or no dependents. For 13 agents requiring 
$280 a month or more as a minimum living expense the average mean 
weekly production was found to be $6554 as contrasted with $5022 for 59 
agents requiring below $250 for living needs. Further manipulation of 
the data was not attempted because of the recognized inadequacy of 
the sample. Apparently the results obtained do tend to confirm those 
discussed by Wallace. 

6. Wallace’s criticism of the conclusions concerning the differentia- 
tion of the criterion groups by various test items and total test scores 
are, as previously discussed, again not considered relevant to the results 
but would be relevant to generalizations based on them. 

7. Finally, Wallace states that the fact of success or failure may be 
a powerful determiner of test responses. This is granted, particularly in 
reference to the preference tests and to a lesser degree, personality tests. 
It is doubtful if it affects intelligence or biographical data. However, if 
test responses at any stated level of experience have any predictive value 
for future success, then they have validity for that purpose. In this con- 
nection some of the results of the study in question are indicative of the 
need for further research. 


The writers gather from the general tone of the note that Wallace feels 
research in this area has been retarded and confused rather than advanced 
by the publication of the study being discussed. It is doubtful whether 
any research problem can be clarified by withholding legitimate data 
(and all data are legitimate) even though the population from which the 
data are derived is inadequate. Even single case datum is sometimes 


valuable. We must use care not to generalize beyond the scope of 


the data. 4 f 
The writers would like to take this opportunity of urging that the 


valuable research by workers in the life insurance field be published in 

the scientific psychological journals so that it will be more readily avail- 

able to academic research workers. 

ea ay: 19, 1949. 

Ren is. References 

1. Kahn, D. F. An analysis of life insurance salesmen. í Unpublished Master’s thesis, 
Purdue University Libraries, West Lafayette, Indiana, 1946, $ ie 

2. Kahn, D. F. An analysis of factors related to life insurance selling. Unpublis 
Doctorate Dissertation, Purdue University Libraries, West Lafayette, Indiana, 


1948. 


362 J. M. Hadley and D. F. Kahn 


3. Kahn, D. F., and Hadley, J. M. Factors related to life insurance selling. J. appl. 
Psychol., 1949, 33, 182-140. 

4, Life Insurance Agency Management Association. 2300 recruits a year later. Hart- 
ford, Conn.: Life Insurance Agency Management Assoc., 1948, pp. 33. 


5. ——. Financing, survival, and production. Hartford, Conn.: Life Insurance Agency 
Management Assoc., 1949, pp. 12. 

6. ——. New agent characteristics. Hartford, Conn.: Life Insurance Agency Manage- 
ment Assoc., 1949, pp. 12. 

7. ——. Canadian recruiting and results. Hartford, Conn.: Life Insurance Agency 


Management Assoc., 1949, pp. 57. 


8. ——. Recruiting results, Hartford, Conn.: Life Insurance Agency Management 
Assoc., 1949, pp. 57. 


9. Wallace, S. Rains. A note on Kahn and Hadley’s “Factors Related to Life Insurance 
Selling.” J. appl. Psychol., 1949, 33, —— - ——- 


Instrument Reading. I. The Design of Long-Scale Indicators 
for Speed and Accuracy of Quantitative Readings * 


Walter F. Grether 
Aero Medical Laboratory, Wright-Patterson Air Force Base, Dayton, Ohio 


Quite a number of instruments used in aviation and elsewhere must 
j be read with precision greater than can be provided by one revolution of 
a pointer on a circular dial of conventional size. There is considerable 


accumulated evidence that, except for the direct reading counter, most 


of the devices that have been used to increase effective scale length result 
in instruments that are very difficult toread. Ina previous study by the 
author (2) on the design of clock dials, it was found that as common an 
instrument as a clock is quite difficult to read. Even the best clock 
designs required approximately 5 seconds (including recording time) for 
readings in hours and minutes by Air Force pilots. Even with this time 
spent on each reading, about 7 per cent of the readings on the better 
clocks were in error. 

Aside from such laboratory data there is considerable evidence of 
instrument reading difficulties in the practical situations where these 
instruments are used. In a study of actual errors made by pilots in 
reading aircraft instruments carried out by Fitts and Jones (1), multiple- 
pointer or long-scale instruments provided the greatest number of serious 
cases of instrument misreading. The instrument reported as being 
misread most frequently was the altimeter. In the typical report the 
altimeter was read too high by a complete revolution of the most sensitive 
pointer, that is by 1000 feet. A tachometer designed with a rotating 
sub-dial to indicate RPM in thousands was likewise read too high by 
1000 RPM. Numerous fatal and non-fatal accidents have been attrib- 
uted directly to such instrument reading errors, and without doubt 
many of the unexplained crashes resulted from similar human failures, 

The major purpose of the present investigation was to make a direct 
comparison in terms of speed and accuracy of quantitative readings of 
several of the possible methods of obtaining increased scale length on 
instruments. The experiment also had a secondary but more specific 
and practical purpose of finding improved methods of indicating altitude 


4 dat: ted in this paper have been previously reported in Memorandum 
DEN TSPAA-60414 and MCREXD-694-14A of the Aero Medical Laboratory, 
Engineering Division, of the USAF Air Materiel Command. 

363 


364 Walter F. Grether 


in aircraft. For this reason all of the instruments were designed to read 
altitude in feet and all readings were made in feet as units. 

It is emphasized that the evaluation of the different indicator designs 
in this investigation was with respect to the speed and accuracy of 
quantitative readings. Actually this is only one of several criteria which 
most instruments should be required to satisfy. It has been pointed out 
by the author (3) that in aviation in particular there would appear to be 
at least three major ways in which instruments may be read, depending 
upon the purpose of the reader. These three types of reading may be 
categorized as follows: a. Check reading—for assurance of a null, normal, 
or desired indication; b. Qualitative reading—for the direction and 
approximate magnitude of a deviation from a null, normal, or desired 
indication; and c. Quantitative reading—for the numerical value of an 
indication. 

The above categories of instrument reading have considerable utility 
as criteria against which to evaluate different instrument designs. It is 
usually possible from a knowledge of the situation in which an instrument 
js to be used to decide the reading purposes or criteria which it is most 
necessary to satisfy. The criteria against which an instrument is to be 
evaluated then provide operational definitions of the experimental meas- 
urements to be made. As mentioned earlier, the experimental indicators 
in this investigation were evaluated only with respect to the third cri- 
terion, quantitative reading. In this study, furthermore, there was no 
concern with small errors of interpolation, only with larger errors re- 
sulting from assignment of incorrect values to graduation marks. 


Experimental Procedure 


Nine experimental altitude indicator designs were used in this investigation. 
These are shown along with some of the results in Figure 1. The first of these 
indicators, design A, is a simulation of the altimeter almost universally used in 
military and larger commercial aircraft. On this instrument the longest 
pointer gives readings in hundreds of feet, the broad pointer is read on the 
same scale in thousands of feet, and the small pointer is read on the same scale 
in ten-thousands of feet. Altimeter designs B and C also simulate existing 
but not commonly used types. 

Altimeter design D uses a single pointer to indicate altitude in hundreds 
of feet. This pointer makes one revolution for each 1000 feet change in altitude 
and the multiples of 1000 feet are indicated on a simulated direct reading 
counter. This counter has two drums, one for 1000-foot and the other for 
10,000-foot increments. It is assumed that the motion of these drums would 
be intermittent and that single whole numbers would always be showing. 

_ In design E, also, only one pointer is used, but two dials rotating behind a 
window indicate the multiples of 1000 feet. Tn this design the motion of the 
dials showing through the window is assumed to be continuous rather than 
ntornittent, thus permitting more than one number (or half numbers) to 

ppear. 

Design F indicates altitude in quite a different manner from the other 


| 


Instrument Reading. I 365 


instruments. In this display the pointer is assumed to make only one revolu- 
tion to cover the entire altitude range. The range being covered is indicated 
in the window as 0-1000 feet, 0-10,000 feet, or 0-100,000 feet. The meaning 
of the numerals on the dial graduations is, therefore, determined by the Tange 
indicated in the window. This indicator is similar in principle to a radio 
altimeter now in use. It is obvious that the precision of indication on such 
an instrument decreases as the range being covered increases. 

Altimeter designs G and H are similar in that they simulate a scale moving 
vertically behind a window. An instrument following design G could use 
either an endless tape or drum to present the moving scale, with a counter to 
indicate multiples of 1000 feet. An instrument using design H would require 
a very long tape with a scale covering the desired altitude range. 


A B 
PERCENT ERRORS OF 1 
1,000 FEET OR MORE e174 | IZo 


INTERPRETATION TIME ZL 6 
IN SECONDS zma; 75 = 1% 


LEGEND 
mm 97 AAF PILOTS 
CƏ 79 COLLEGE STUDENTS 


6 H 
PERO Ooo reer on mone Emas p y jg Big 183 


1000 
een SECONDS "S 83 mes B OBIS 188 
Fig. 1. Speed and accuracy in reading altitude from different types of instruments. 


‘ ‘i ; 5 i Ni tót 
The last experimental design, I, simulates a simple direct reading coun’ 
Rrihout any FONE pointer or ‘scale. For reasons pointed ee in = 
discussion of results, such an indicator would probably be ee r e kor 
the pilot, but might be suitable for other aircrew members such as ie lh 
gator, One of the major reasons for including it in this a oN get in 
approximate measure of the time required to copy a series of numbers rep) a 
senting an altitude reading, it being ae er pa test ata time woul 
i i ding altitude from this type of in T. 
be ipo ER ie AGE deneni ied a ee th perc 
. The cover (page 1) of each boo: p ed th 
Aiet REA Peai a reading ta Pe wre, nen 
ial for the subject to read. t 2a i 
the Sei dees teas EIA with 12 different settings. Under each picture 
was a space for writing in the reading. 
i iis number of drawings needed for the nine test booklets were produced iy 
Miss Mary Cowles of the Psychology Branch with the photographic assistance ol 
Mr. D. M. Penrose of the Laboratory Services Unit of the Aero Medical Laboratory. 


366 Walter F. Grether 


Special precautions were taken in the preparation of the drawings and choice 
of altitude settings to be used in the various test booklets to prevent biasing 
the results for or against any of the indicator designs. The circular dials were 
2% inches in diameter. From this other dimensions can be estimated from 
Figure 1. All essential numerals and graduation marks were sufficiently large 
and distinct to be easily legible. Except for the inner dials on designs B and 
CG all scales were alike in having numerals at all 100-foot graduations with 
intervening marks at 20-foot intervals. Other factors equalized were the 
number of settings above and below 10,000 feet, the number of sensitive 
pointer settings on 100-foot graduation marks, the number of sensitive pointer 
settings just preceding and just following the zero on the scale, and the number 
of sensitive pointer settings on the left and right halves of the dial. Precau- 
tions were also taken to be sure that no essential information was hidden by 
any of the hands, and that the interrelationships between pointer positions 
were correct. For indicator design F some of the settings were midway 
between graduation marks. For the remaining designs the sensitive pointer 
(or reference mark) was always on a graduation mark. Thus, no interpolation 
was required to obtain correct readings. 

The altimeter reading test was taken by 97 USAF pilots in the Instrument 
School at Barksdale Field, Louisiana, and 79 college men (without aircrew 
experience) at Denison University, Granville, Ohio. In administering the 
test, the booklet for only one altimeter design was passed out at a time, and 
sufficient time was allowed for reading the instructions and working the sample 
item. Ata signal all subjects opened the booklet and worked until completing 
all items. Each subject’s completion time was recorded on his booklet. 
Four sequences for administering the nine test booklets were used in order to 
counterbalance for learning effects. An approximately equal number of sub- 
jects (in each of the two subject groups) took the test in each sequence. 

The two subject groups of dissimilar experience were used in order to get 
some measure of the effect of experience on the ability to read the various dial 
designs. All of the USAF pilots can be assumed to have spent several years 
flying with altimeter design A, and possibly some experience with designs B 
and C. The college men can be assumed to have had little, if any, experience 
in reading altitude from any type of indicator. In general intelligence and 
education the two groups were very similar. 


Results of Comparisons Among Indicator Designs 


The data obtained in this investigation were analyzed to determine 
the frequency of errors and the time per instrument reading. These 
‘results are shown in Table I which gives the per cent of total readings 
in error for the nine indicator designs. None of the errors included 
in this table resulted from inaccuracies in pointer interpolation since all 
settings of the sensitive pointers were on graduation marks (except for 
design F which had some settings midway between marks). 

Data on speed of reading are also shown in Table 1. It will be re- 
called that the subjects wrote their answers in the test booklets and the 
time for completion was recorded in each instance. The average time per 
reading could thus be computed from the total time and the total number 

2 Altimeter reading errors during actual flight probably occur with much lower 


frequency than found in this study, since in flight the pilot can anticipate the approxi- 
mate readings. 


I nstrument Reading. I 367 


Table 1 
Altitude USAF Pilots, N = 97 College Men, N = 79 
indicator Per Cent Seconds 
- Design Errors Readin i E j ey Bevonds Pe r 
(a) (b) (c) (d) 
A 15.9 9.6 20.8 9.8 
B 15.0 8.6f 17.9 9.6 
Cc 8.3} 7.3t 11.4 7.6} 
D 3.5t 4.2} 2.1t 4.1 
E 17.3 8.8 15.3t 9,2 
F 24,1 8.7t 21.0 8.3t 
G 2.1f 4.8} 3.0t 4.2¢t 
H 2.5t 4.2t 45t 4.2t 
I 0.6t 2.5t 0.3t 2.3t 
Confidence 
N r level 


Correlation between speed and accuracy 
for different designs: 
For pilots (columns a and b) 9 designs 91 1% 
For college students (columns ¢ and d) 9 designs 95 1% 


Correlation between pilots and college 
students on different designs: 
Per cent errors (columns a and c) 9 designs 95 1% 
Seconds per reading (columns b and d) 9 designs 99 1% 


Correlation between speed and accuracy of 
individuals on all designs: 
Pilots 97 pilots 
College students 79 college students 


EE 
x 


* Reading time included time for subject himself to record answer. f 
} Indicates statistical significance (at one per cent level of confidence) of superiority 


over conventional altimeter (design A). 


of items, but this time included the time for recording as well as for 
reading or interpreting the instrument. 

A reproduction of each of the experimental indicator designs accom- 
panied by graphic illustrations of the more significant findings is provided 
in Figure 1. The upper pair of bars under each indicator shows the 
per cent of errors equal to or exceeding 1000 feet for the two groups of 
subjects. The lower pair of bars gives the computed interpretation time 
for each of the two groups of subjects. An estimate of the time for inter- 
pretation only was obtained by subtracting from the average time for 
each design the average time for design I (the direct reading counter). 
The reading of altitude from design I involved the mere copying of the 
numbers shown, and hence was assumed to require no interpretation. 


368 Walter F. Grether 


Discussion of Results 


Indicator Designs A, B, and C. The results of this investigation, as 
shown in Figure 1 and Table 1, show that Design A, which simulates the 
conventional altimeter, is a very difficult instrument from which to ob- 
tain quantitative readings as required in this study. Even the pilots, all 
of whom had spent several years flying with this instrument, spent more 
time per reading on this indicator than on any of the other designs 
studied. Only one of the remaining eight indicators, design F, resulted 
jn a higher proportion of errors. Tt must be concluded that it is a very 
difficult task to combine into a single numerical value the readings of 
three pointers indicating on a single scale, as required in reading the con- 
ventional altimeter. Designs B and C apparently were slightly easier 
to read than design A. 

Indicator Design D.* This indicator uses only one pointer, with the 
1000-foot and 10,000-foot indications provided by a counter. Such a 
combination proved to be very easy to read. For USAF pilots the per 
cent of total errors was very low, 3.5 per cent, and only 1.7 sec. was 
required for interpretation (as contrasted with 15.9 per cent and 7.1 
sec. for the conventional altimeter). More significant, perhaps, is the 
finding that only 0.7 per cent of the readings erred by more than 1000 feet. 
Most of the errors in reading indicator design D resulted from assigning 
10 feet instead of 20 feet to each of the graduation intervals between 
numerals. 

Indicator Design E. The substitution for two of the pointers on the 
altimeter of two rotating dials appearing through a window appears to 
have no advantage. This indicator was designed so that under most 
circumstances only one number would appear on each of the two rotating 
dials. But if such dials rotate continuously (rather than intermittently) 
during altitude changes, as assumed in this test, it is inevitable that at 
certain settings two numbers will be equally visible. Such indications 
are very difficult to read correctly. 

Indicator Design F. On this indicator the range covered by the indi- 
cating pointer and scale is dependent upon range limits shown in the 
window. The high proportion of errors and slow reading time suggest 
that the required changes in interpretation of the primary scale are 
difficult for human beings to carry out. 


ge the basis be this study indicator design D, combining a sensitive pointer with a 

peeing counter, was recommended as a replacement for the existing three-pointer 

ter. a consequence the Kollsman Instrument Division of the Square D Com- 

pany ANTRA developing such an altimeter. Two other items of aviation equipment 

ane ly being developed by the Air Force, an absolute (radio) altimeter and an air- 
rne distance measuring device, are also incorporating this type of indication. 


Instrument Reading. I 369 


Table 2 


ney of Various Types of Error Made by 97 USAF Pilots and 79 College Students 
; in Reading the Conventional Three-Pointer Altimeter 


3 Coll 
Description of Error Pilots Students 


g to lower adjacent numeral when 
nearest numeral is correct— 
(Failure to consider more sensitive pointer) 100 Ft. 0.0 0.0 
1,000 Ft. 0.26 2,22 
10,000 Ft. 0.0 0.11 


Total 0.26 2.32 


ve lacement of digit in number series— 


(Interchange of digit with adjacent zero) 20 Ft. 0.17 0.42 
} 100 Ft. 0.86 0.95 

1,000 Ft. 2.06 2.64 
10,000 Ft. 0.86 1.48 


Total 3.95 5.48 


20 Ft. 3.09 2.64 
100 Ft. 1.20 1.05 
1,000 Ft. 1.46 2.85 
10,000 Ft. 0.09 0.53 


Total 5.84 7.07 


Omission of one pointer— 
100 Ft. 0.0 0.0 
1,000 Ft. 0.86 0.21 
10,000 Ft. 0.86 1.05 


Total 1.72 1.27 
Pointer exchange— 
ri 100and 1,000 Ft. 0.17 0.84 
100 and 10,000 Ft. 0.0 0.0 
1,000 and 10,000 Ft. 0.09 1.48 
: Total 0.26 2.82 
Repetition of reading on one pointer— 0.95 0.84 


Complex and unclassified errors 0.86 1.48 


370 ` Walter F. Grether 


Indicator Designs G and H. The vertical moving scale instruments 

to be easy to read in this experiment. The virtues of such in- 

struments for check reading and qualitative reading were not evaluated 
in this study. ' 

Indicator Design I. This indicator, which simulates a simple Veeder 
counter, was read with greatest speed and accuracy of all the indicators. 
This would suggest that where only quantitative readings are to be 

this would be the most desirable type of instrument. It is 
believed that for check reading and for qualitative reading such an 
instrument would be inferior to one using a moving pointer. 


‘Types of Error in Reading the Three-Pointer Altimeter 


Because of the widespread use of the three-pointer altimeter, and 
because of the frequent use of this type of multiple pointer indication for 
other purposes, it seemed worth-while to make a more detailed analysis of 
the types of errors made in quantitative readings of this instrument. 
‘This analysis was based on the same data that have already been sum- 
marized in Table 1 and Figure 1. It will be recalled that 97 USAF pilots 
and 79 college students each made 12 readings on the three-pointer alti- 
moter. This gave a total of 1164 readings by pilots and 948 by non- 


pilots. 

“The detailed classification of errors into types and sub-types is shown 
in Table 2 along with the per cent of total readings in which each occurred. 
Two or more types or sub-types of errors were in some cases charged 
against a single erroneous reading. For this reason the figures in the per 
cent columns total up to more than the total per cent errors as reported 
in Table 1. For detailed descriptions of all the error types, and the 
assumed mental processes which led to the incorrect answers, the reader 
is referred to Aero Medical Laboratory Memorandum Report No. 
MOREXD-694-14A. 

Discussion 

In an experiment such as this a number of questions arise with regard 
to the suitability of the criterion measures which have been used and with 
Pamilia toa of the subject group upon the results. For this reason 
been included in Table 1 a number of correlation coefficients 

which bear on these problems. 
Pei question facing the experimenter is the effect of the experi- 
i subject group upon the validity of the findings. In the 
presen! ae two subject groups were used which represented ex- 
oboe i epii as related to the task being performed. All USAF 
pilots considerable experience in reading one of the experimental 
indicator designs along with general experience in reading aircraft instru- 


a 


Instrument Reading. I 371 


The college students, on the other hand, included no pilots or 
military air crew members. In spite of this difference in back- 
und of experience the two groups gave highly similar results as indi- 
cated by a correlation between the two groups of .95 on per cent errors 
d .99 on seconds per reading. This would suggest that the previous 
ience of the subjects is of relatively minor importance in an experi- 
_ ment of this type. 
In the present experiment neither speed nor accuracy of response were 
ontrolled, thus making possible two independent criteria for evaluation 
_of the different dial designs. In Table 1 the correlations between speed 
and accuracy for the different dial designs are .91 for pilots and .95 for 
college students, indicating very high agreement between the two criteria 
or goodness of the several indicator designs, Or stated differently, the 
icator designs which were read most rapidly were also read most 
‘accurately. Correlation coefficients between speed and accuracy of in- 
dividuals for all designs are also positive, but much lower, .38 for pilots 
and .44 for college students. These values indicate, however, that in 
ce. the individuals who read the indicators most rapidly also read 
them most accurately. In a previous study by the author (2) on clock 
_ dial designs the correlation coefficients were likewise positive, but some- 
_ what lower in magnitude. 
__ In two previous experiments on instrument design by Loucks (4) and 
Sleight (5) a somewhat different technique was used in that the instru- 
‘Ment exposure time was controlled tachistoscopically and only accuracy 
of reading was measured. Such a technique might be expected to force an 
increased error rate and thus accentuate the differences between indicator 
designs. It is the belief of the author, however, that such a control of 
exposure does not constitute control over response time, but serves 
_ rather to restrict the number of visual fixations of the displayed material. 
The actual response may be delayed for several seconds during which 
_ the subject retains a mental image of the indicator scale and pointer. 
It is quite possible that in the experiment of Sleight (5) the use of a 
_ controlled exposure time which did not permit a change in the prepara- 
_ tory eye fixation led to erroneous findings. It is believed that this tech- 
nique favored the fixed pointer indicators on which the subject was able 
__ to anticipate the location of the pointer. The two fixed pointer indicators 
= in the present study, designs G and H, showed no general superiority 
_ Over the only comparable moving pointer indicator, design D. 


Summary 


© An evaluation was made of the speed and accuracy with which 
_ quantitative readings could be made of nine experimental altitude indi- 


372 Walter F: Grether 


cators. The results are considered to apply also to other types of 
quantitative indication which require very great scale length. Evalua- 
tion of the various indicator designs was made by having 97 USAF pilots 
and 79 college men read 12 settings on each instrument. The instru- 
ment faces were reproduced in test booklets which provided spaces for 
writing in the readings. Both accuracy and speed-of-reading data were 
obtained for each of the nine indicator designs. 

The major conclusions indicated by the results of this investigation 
are as follows: 


1. The combining into a single numerical value of the indications 
from two or more pointers, or from a pointer and rotating subdials, is a 
relatively difficult task for human beings. Such instruments are con- 
ducive to very large errors in reading. 

2. The ease with which long scale indicators can be read quantita- 
tively appears to depend upon the extent to which the digits are already 

j combined in the proper sequence by the instrument. 

3. A multiple pointer instrument such as the altimeter with contin- 
uous motion of the non-sensitive pointers is frequently read too high by 
a complete revolution of the sensitive pointer. 

4. The speed and accuracy of instrument reading are positively 
correlated, indicating that gains in reading speed can normally be ex- 
pected to improve accuracy also. 

5. College men without altimeter reading experience showed virtually 
the same pattern of results in this study as highly experienced USAF 
pilots, suggesting that instrument reading difficulties are quite basic in 
nature and not readily modified by experience. 

Received October 25, 1948. 


References 


1. Fitts, P. M. and Jones, R. E. Psychological aspects of instrument display. I. 
Analysis of 270 “pilot error” experiences in reading and interpreting aircraft 
instruments. USAF Air Materiel Command Memorandum Report No. TSEAA- 
604-12A, 1947, 

2. Grether, Walter F. Factors in the design of clock dials which affect speed and 
ARA of readings in the 2400-hour time system. J. appl. Psychol., 1948, 32, 

3. Grether, Walter F. Designing instrument dials for qui i 

e ter F. quick, accurate reading. Ma- 

pil chine Design, 1948, 20, 150-152 and 208-209. 

- ioe! R. B. Legibility of aircraft instrument dials: The relative legibility of 
a“ oe dials, AAF School of Aviation Medicine, Project No. 265, Report 

5. Sleight, Robert B. The effect of instrument dial ibili 

sha i . J. appl. 
Psychol., 1948, 32, 170-188. oo 


Types of Errors in Location Judgments on Scaled Surfaces. 
i I. Errors of Configuration * 


Adelbert Ford 
Department of Psychology, Lehigh University 


Many instruments have been so designed that an operator is required 
to locate the position of a signal on a flat area with reference to a super- 
imposed system of scales, which may be either polar or rectangular. This 
Study deals with the latter. The nature of this signal may be a small 
= dot, or a white patch of small dimensions, which appears suddenly and 
must be reported for its elevation on the y-axis and its horizontal location 

on the x-axis. This is typical in using some of the types of radar cathode- 
_ Tay scopes. 
è In practice there are two systems for keeping such a signal or spot 
= located. 1. A transparent plastic scale, engraved with suitable reference 
lines, may be placed over the area, the operator locates the proper line, 
follows to the end, and notes the position between engraved numerals, 
interpolating between lines when necessary. 2. The operator may be 
provided with a “tracker” which he pushes around the face of the area, 
keeping it superimposed on the signal, and this mechanism registers the 
x-and y-values on remotely located meters.’ The latter has been found 
to be objectionable because it usually required two operators, and it is 
_ Mechanically complex. However, the simple method of using scaling 
_ assistance may involve intolerable errors, greater mental concentration, 
and therefore it is desirable to know just what kinds of errors an average 
operator does commit in using scaling assistance, on the basis of quantita- 
tive experimentation. If these types of errors are found to be intolerable, 
then it is worth while to pursue such engineering design as may eliminate 
the human error in scale reading methods. 

The present study deals with the first of four types of reading errors 

* This research was executed under Contract No. W28-099-ac-130 between the 
Institute of Research, Lehigh University, and the USAF Air Materiel Command, 
Watson Laboratories, Red Bank, N. J. The investigation was made to ascertain the 
accuracy of radar operators in the interpretation of scope signals. 

The author wishes to thank the psychologists on the staff of the Aero Medical 
Laboratory, Psychology Division, Wright-Patterson Air Force Base, Dayton, Ohio, 
for suggestions concerning the equipment area needing study, and the officers and 
psychologists of the Strategic Air Command, Andrews Field, Washington, D. C., for 
field facilities in securing typical operating records. 


373 


374 Adelbert Ford 


on scaled surfaces. This will be called errors of configuration because we 
shall show that the configuration or shape of the field produces systematic 
errors in one part of the field as contrasted with errors in another part 
of the field. 

‘A sector scope is essentially a triangular area. This sets the condition 
for the perspective illusion. Objects near the apex of the triangular space 
appear larger than at the open end of the triangle. It would be expected 
that elevation judgments, with respect to a zero line of reference would 
be correspondingly exaggerated. The questions are: How much? Are 
all people susceptible? 

There are many citations of general principle in the literature. 
Ponzo (1) showed the principle with respect to estimated lengths of 
lines. Köhler and Wallach (2) maintained that space estimations at 
the open end were underestimated while those at the apex were over- 
estimated, 


Method 


1. Artificial Series. The types of scope faces used in the artificial 
“series are presented in Figures 1 to 6. The figures on the left are for the 
sector-type of radar scope, commonly used, and show the condition for 
the perspective illusion. The figures on the right are rectangular pres- 
-entations used as a “control” with the same kind of problems, All scope 
pictures were 7 inches in diameter, viewed at a distance of 16 inches, or (in 
group experiments) with an equivalent visual angle. 
2. Natural Series. Figure 7 exhibits a photograph of a real radar 
sector-type scope, one of the stimulus series which we presented with the 
ultra-violet radar simulator. The white spot at the right is a signal from 


Fra. 1, Sector “unscaled” Scope. Fig. 2. Rectangular “unscaled” scope. 


—— 


Fie. 3. Sector scope with 100-foot Fic. 4. Rectangular scope with 
side lines. 100-foot side lines, 


Fia. 5. Sector scope with multiple scaling. Fic. 6. Rectangular scope with 
multiple scaling. 


an approaching airplane about to land. In this series it was impossible 
to use the rectangular scope for comparison, because no scopes were 
made that way. 

All projected images were on a phosphorescent radar screen, of 
typical color and signal persistence, except in the group experiments. 
Signals were presented serially in a fairly realistic rate of progression. 
| All signals were white on a black field. 

Subjects were scored for error and reaction time by recording verbal 
answers as rapidly as made, and using a chronoscope actuated by a 
voice key. 

There were three types of scaling assistance, as indicated in Figures 


376 Adelbert Ford 


Fia. 7. Photograph of sector presentation on a real 
radar scope used for ground-control-approach. 


1 to 6. (a) The “unscaled” scope showed merely a zero line of reference 
with a marginal standard for space values (Figures 1 and 2). (b) The 
“100-foot scope” used the same zero reference line, but had a parallel 
line, above and below, to show 100 feet of signal deflection. The lines 
representing 100 feet were 0.4 inch above and below the zero line (Figures 
3 and 4). (c) The scope with “multiple scaling” had fine lines, 0.1 inch 
apart, with every fourth line heavier. Fine lines represented increments 
of 25 feet. Heavy lines represented 100 feet deflection increments 
(Figures 5 and 6). 
Results ! 

It will be necessary to remember that these experiments were run on 
rather complex equipment, generally one run at a time, and that the 
number of readings possible was thus restricted, and the number of 
subjects used was necessarily limited. All differences were subjected to 
calculations of critical ratio, and the differences in means with a signifi- 
cance better than the one percent level have been shown in italics in all 


tables. It will be obvious that statements made under conclusions are 
qualified. 


1 Expanded tables, in much greater detail, have been reported by A. Ford and M. G. 
Getz, The Perspective Illusion in Radar Sector Scopes, Technical Report No. 1, Contract 
W28-099-ac-130, Watson Laboratories, Air Materiel Command, USAF, 10 June 1948. 


. = 


Types of Errors in Location Judgments on Scaled Surfaces. I 377 


s not possible to keep subjects completely unsophisticated with 
to the existence of the illusion, during the 18 months of work. 
re we shall show training effects by separate tables. 

Untrained Subjects on _ Artificial Scopes. The introductory “or 


ird area of the open end of the sector with the one-third area at 
end of the scope field. This trend is reduced to a single figure 
in percentage of over or underestimation. A similar problem i is 


same manner, and used as a control. Presentations were randomized. 
ayy ; 


Table 1 
The Perspective Illusion in Untrained Subjects 
Percentages of Error in Elevation Judgments 
Individual Experiments, Artificial Scope 
Combined Data for Four Subjects A 
_ Plus signs indicate over estimations; minus signs, under estimations. 


Sector Scope Rectangular Scope 
(Experimental) (Control) 
Right Difference 
5 of Scaling Left (Open Tilusory 

“lhe (Apex) Center End) Left Center Right nd) 
+19.6 —9.8 —.01 -11 —3 +23 +23.1 

+14.0 —3 +424 +8 43.1 -4 +10.4 

—45- 42.8 +11 +1 +25. +15 —4.2 


ce. Values are averages based on 40 to 55 runs. 


The first, or introductory runs, showed a strong illusory trend of over- 
_ estimation at the apex of the sector scope (Table 1), when the field was 
4 “unsealed. ” The introduction of a pair of 100-foot reference lines (0.4 
inch on each side of the line of zero position) had a marked effect in re- 
ducing the illusion, but did not eliminate it. When the multiple scaling 
system was used (Figures 3 and 6) there was no reliable evidence of 
ory effect. 

2. Partially Trained Subjects on Artificial Scopes. After 40 to 55 
the next section of the training series (Table 2) showed a reduction 
the amount of the illusion on the unsealed scope, and no significant 
illusory trends for the scopes with 100 foot scaling lines, or the multiple 
Sealing system. Evidently scaling reduces the illusion. 

3. Individual Differences. Dealing with the spread of “random 


378 Adelbert Ford 


Table 2 
The Perspective Illusion, Second Training Stage 
Percentages of Error in Elevation Judgments 
Individual Experiments, Artificial Scope 
Combined Data for Four Subjects 
Plus signs indicate over estimations; minus signs, under estimations. 


Sector Scope Rectangular Scope 
(Experimental) (Control) 

Right Difference 

of Scaling Left (Open 3 (Illusory 
ssistance (Apex) Center End) Left Center Right Trend) 
Unsealed +118 +69 —2.7 —20 -7.1 +.7 +17.2 
100 ft. Lines +39 +31 +41 —-3 +21 +.7 +.8 
Multiple Lines -3 -18 +.6 +8 +4 +17 -18 


Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are based on 55 to 109 runs. 


errors” all subjects were very much alike, showing standard errors closely 
similar. However, with respect to the illusion, it had become evident 
that some subjects have strong susceptibility, and an occasional subject 
seems to be completely free from any proneness. Table 3 is presented 
to show individual differences. One subject entered too late to be given 
the runs on the unsealed scope, but is included to complete the data. 


Table 3 
The Perspective Illusion, Second Training Stage 
Percentages of Error in Elevation Judgments 
Individual Experiments, Artificial Scope 
Data for Individual Differences 
Plus signs indicate over estimations; minus signs, under estimations. 


Unsealed 100-ft. Refer- Multiple 

Scope ence Lines Scaling 
Rec- Differ- Rec- Differ- Rec- Differ- 
Sector tan- Sector tan- ence Sector tan- ence 


Initials Seo (iu i i 
Í pe, paler (Illu- Scope es, (Illu- Scope gular (Ilu: 


: sory (Experi- sory (Experi- (Con- _ so 
Subjects mental) trol) Trend) mental) trol) Trend) mentah) trol) Trend) 
RJR +271 -15 F286 43 -14 417 —9 +50 -5.9 
BPH. +169 -78 +247 +46 -21 427 43 -28 +431 
LAA +123 -10 +4133 +419 -—2 421 —4 -29 425 
DMS. +32 +16 +6 -90 +10 -100 -23 —3 -20 
W. A.S. (New Subject} +22 —55 +77 —-17 -41 +424 


: Note: Difference values expressed in italics are better than the one per cent level of 
significance, Values are based on from 13 to 25 runs. 


of Errors in Location Judgments on Scaled Surfaces. I 379 


is a breakdown from data in the series for partially trained subjects. 
subjects are quite strongly susceptible. One is not. 

led us into an experiment in which the same problems were 
ed on a large screen for a group experiment on untrained subjects, 
_yisual angle kept approximately the same. Table 4 arranges 
bjects in an order of most susceptible to least susceptible. The 
of this section was to secure more extended data on individual 


Table 4 
The Perspective Illusion in Untrained Subjects 
Percentages of Error in Elevation Judgments 
' Group Experiments, Artificial Scope 
~ Plus signs indicate over estimations; minus signs, under estimations. 


Unsealed 100-ft. Refer- Multiple 
Scope ence Lines Scaling 
Rec- Differ- Rec- Differ- Rec- Differ- 


Sector tan- ence Sector tan- ence Sector. tan- ence 
Scope gular (Ilu- Scope gular (Illu- Scope gular (Illu- 
(Experi- (Con- sory (Experi- (Con- sory (Experi- (Con- sory 
mental) trol) Trend) mental) trol) Trend) mental) trol) Trend) 


+33.3 —31.2 +64.5 +13 +74 —6.1 —24 +80 —10.4 
+55.5 —.6 +661 +8.3 —16.7 +25.0 —2.2 +6.9 —8.1 
+52.8. —2.9 +55.7 +39.3 +8 +38.5 =7.4 —3.3 =4.1 
+52.8 -2.0 +54.8 +3.3 -4.3 +7.6 +16.8 —17.2 +34.0 
+49.8 —8 +50.6 +5 -2.8 +2.8 +6.7 —11.6 +18.3 
+65.0 +14.6 +50.4 +6.9  +2.6 +4.3 +3.6 +83.7 ml 
+36.5 +1 +36.4 +64 —5.8 +12.2 “6 =7 +1 
+26.1 —54 +31.5 +5.6 —3.1 +8.7 —5.0 —2.2 —2.8 
+17.2 —11.1 +283 —3.1 -9 2.2 =24 +1.7 4.1 
+25.5 +13 +242 =1. +2 —.3 +7 1.5 +2.2 
+18.2 -5.2 +23.4 +31 —15 +4.6 —2.4 +38 —6.2 
+148 -6.0 +20.8 +69 13 +8.2 +.6 +25 =1.9 
+23.7 +3.1 +20.6 +3.1 —20.7 +23.8 +14 -10 +24 
+16.9 —3.7 +20.6 +7 +5 +.2 +23.3 —18.9 +42.2 
+31.4 +12.9 +18.5 —6.1  —9.9 +83.8 +15 —41 +5.6 
=1.1 
3 


+114 -5.0 +18.4 1. —2.6 +15 —3.1 -8 —2.3 
+23.3 +7.5 +15.8 +23.3 —10.3 +33.6 -19 +.9 2.8 
+58 +2.2 +3.6 +1.0 —16.0 +17.0 —-9 —10.3 +9.4 


—4 +13.9 —143 +5 —3.0 +3.5 +18.9 -12 +20.1 


_ Note: Difference values expressed in italics are better than the one per cent level of 
ance. Values are averages based on from 14 to 28 runs. 


ces. Maximum susceptibility reached 65% overestimation at 
ex of the sector sweep. h 
Unirained Subjects on Reproductions of Field Scopes. Six new 
trained subjects were now tried on reproductions of radar field 
using the ultra-violet projection radar simulator. Three are 
‘susceptible (see Figure 7 and Table 5). One is suspected of 
oderately susceptible. Two showed no reliable differences. 


380 Adelbert Ford 


Table 5 
The Perspective Illusion in Untrained Subjects 
Percentages of Error in Elevation Judgments 
Individual Experiments, Reproductions from Field Scopes 
Plus signs indicate over estimations; minus signs, under estimations, 


Unscaled 100-ft. Refer- Multiple 
Scope ence Lines Scaling 
Differ- Differ- Differ- 
ence ence ence 
Initials Apex (illu- Apex (ilu- Apex Mu- 


of of no 80 of n sory of Open sory 
Subjects Sector d Trend) Sector ei Trend) Sector End Trend) 


J.H.J. +214 -234 +448 
R.C. +44 -21.9 +26.3 —10 —13.9 +12.9 +2.77 —14.6 +17.3 


F. P.A. +46 —17.1 +21.7 —5.8 -5.7 -1 
CRB. -63 —13.8 +475 +87 —22.7 +31.4 —6.9 +.1 —7.0 
JEP. -7.0 -13.3 +4638 -30 -55 +25 —.6 -16 +10 
RGF. -145 -28 -117 -27 -99 472 -65 +7.0 —18.5 


Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are averages based on from 17 to 48 runs. 


5. Trained Subjects on Reproductions of Field Scopes. There were 
two subjects who had started with the program, 18 months before. These 
are considered as trained subjects. The data are presented in Table 6. 
Neither has a reliable difference between the apex and the open end of 
the scope, either for unscaled or scaled scopes. One subject, L. A. A., 
had been mildly susceptible in the earlier stages (see Table 3). The 


Table 6 


The Perspective Illusion in Trained Subjects 
Percentages of Error in Elevation Judgments 
Individual Experiments, Reproductions from Field Scopes 
Plus signs indicate over estimations; minus signs, under estimations. 


Unscaled 100-ft. Refer- Multiple 
Scope ence Lines Sealing 
Differ- Differ- Differ- 
is ence ence ence 
Initials oe Œu- Apex re Apex (illu- 


of 
sory so! 
Subjects Sector hd Tren) Sector Tet Trend) Sector ad TRID) 
Sasa eee ee poet and) Frend) Rector End Trend) 


Ea =78 -90 +12 -60 +27 -87 -22 439 -61 
M.S. -55 -15 -40 -56 -28 -28 -29 -27 —3 


Note: All differences have a significance poorer than the one per cent level. 


ie ais 


t 


of Errors in Location Judgments on Scaled Surfaces. I 381 


subject, D. M. S., had never been susceptible, even on the artificial 
s. There seems to be evidence that, for some subjects, training 
' decreases the illusory effect, or even eliminates it. 

The Kéhler-Wallach Principle. In dealing with the perspective 
m Köhler and Wallach (2) stated that space judgments in a triangu- 
area showed overestimation at the apex and also underestimation at 
open end, whereas many of the textbooks stress only the overestima- 
t the point of the triangle. The Kéhler-Wallach principle is well 
ted among the first three subjects of Table 5. In the artificial 
series the principle was still there, though this has been omitted 
form of our tabulation, but it was much milder, with only slight 
dencies toward underestimation at the open end. 


Summary 


When the position of signals on the area of a sector-type scope 
the apex of the scan-line sweep, in what is essentially a triangular 
 overestimations of space reach as much as 65% for some subjects. 
The great majority of all subjects show some degree of suscepti- 
, but the range of individual differences extends from complete lack 
proneness to about 65% relative overestimation at the apex. Within 
limitations of the number of subjects used, it might be expected that 
of all subjects show some degree of the illusion. 
The design of the field of a radar scope must take into consideration 
shape of the field for its total effect on scope reading errors. With 
ct to this illusion, clearly visible multiple scaling will reduce the 
e distortion, but judgment must be deferred lest other types of 
are introduced, and these will be presented in later articles. 


ived April 18, 1949. 


References 


» M. Urteilstatischungen über Mengen. Archiv. für die Gesamte Psychologie, 
1928, 65, p. 135. 

, W., and Wallach, H. Figural after-effects, an investigation of visual proc- 
esses. Proceedings of the American Philosophical Society, Vol. 88, No. 4, October 
1944, p. 288. 


Types of Errors in Location Judgments on Scaled Surfaces. 
II. Random and Systematic Errors * 


Adelbert Ford 
Department of Psychology, Lehigh University 


A large variety of instruments require operators to report the position 
of a “signal,” such as a white spot, by reading its position with reference 
to superimposed scaling lines. In dealing with types of radar associated 
with the navigation of aircraft a single large error could cause loss of life 
and the destruction of expensive equipment. 

In the last article? we noted the existence of errors caused by the 
shape of the field surface. In the present article, using the same scaling 
and problem sequences, we propose to show: (1) the size of the random 
errors caused by the limiting effects of interpolating scale values of specific 
scales, and (2) certain systematic errors consisting in particular of the 
confusion error, defined as a mistaken interpretation of the numerical 
value of the scale points, and what we shall call persistence errors, defined 
as a proneness of some subjects to bias reports in a sequential series by 
memory effects of the previous reports. 

Although the present report is specifically concerned with position 
reporting from scaled areas, it will probably be instantly perceived that 
some of the principles are perhaps equally applicable to linear scales. 
The consequences of this error analysis are much more basic than the 
harrow application to radar scopes. 


Fineness of Scaling and Random Error? 


_As illustrated in the previous article, there were three types of scaling 
used for these experiments: (1) a scope with a zero line of reference across 
the field, but no other scaling assistance other than a sample scale printed 


* This research was executed under Contract No. W28-099-ac-130 between the 
Institute of Research, Lehigh University, and the USAF Air Materiel Command, 
Watson Laboratories, Red Bank, N. J. The investigation was made to ascertain the 
norne, of radar operators in the interpretation of scope signals. 

Ford, A. Types of errors in location judgments on scaled surfaces: Errors of con- 
figuration. This Journal, Vol. 33, August, 1949. 

be Readers who Possess a cleared status for restricted reports will find a more elab- 
crated description of the tables and calculations in: A. Ford and M. H. Getz, Types of 
Errors in the Reading of GCA Scaled Scopes, Technical Report No. 4, Contract W28-' 
099-ac-130, Watson Laboratories, Air Materiel Command, USAF, 31 August 1948. 
Restricted. 

382 


Errors in Location Judgments on Scaled Surfaces. II 383 


the scope for comparison; (2) a scope with a so-called ‘‘100- 
e Line” located parallel to and 0.4-inch away from the zero 
ce; and (3) a scope with a multiple system of parallel lines, 
by tenths of an inch, each line representing 25 scaled feet. 
tical reasons, the errors were all reduced to percentage values 
‘section of the data, using only pips which were 50 or more scaled 
the zero line of reference. Figures 1 to 3 are based on the 
records of five subjects. (It will be shown later that individual 
in random error are small.) 

ure 1 shows that, for the unscaled scope, the standard error was 


47 


18-21 |7 


22-25 


6 
2 
b ò 


29 
-25 
-18-21 


- 14-17 
-10-13 
6 
2 


PERCENT ERROR-UNSCALED SCOPE 
Fic, 1. Distribution of errors on the unscaled scope. 


77 


Mm 
te aed 


RIET vog 


PERCENT ERROR—I00-FT. LINES 
i Fia. 2. Distribution of errors on the scope with 100-ft. reference lines. 


384 Adelbert Ford 
150 


O=4.59 


DONN HD 
' t H ' I 
oore 
EREN 

PERCENT ERROR- MULTIPLE SCALE 


Fra. 3. Distribution of errors on the scope with multiple scaling. 


l 
m 
iy 


18-21 
14-17 ||— 
10-13] |> 


22 


10-13 || 
14-17 |N 
18-21 |- 


side lines, 0.4-inch away from the zero line of reference, reduced the 


standard error to 8.48%. Figure 3 shows that with the use of a multiple 
system of lines, one tenth inch apart, the standard error is now reduced 
to 4.59%. 

Now Garner 


in the form of ¢ 
multiple system produced 


‘ors, but we shall have to indicate, 


) that the smallest spread of random error 
was produced for the finer scaling. We found no statistically reliable 


difference in verbal reporting reaction time. This may be a difference 


{ 


x 
a 
f 


£ 


y 


of Errors in Location Judgments on Scaled Surfaces. II 385 


human reactions on polar scaling, which Garner used, and 
scaling, which we used. 
this stage in the experiments we went into a more detailed gathering 
a on the finely scaled scopes, to see whether or not the advantage 
er random error was not offset by the presence of systematic 
which could not be tolerated. 


Absolute Amount of Random Error 

since we have ascertained that the more finely scaled scope yielded 
mallest random error, in percentage figures, we shall now confine our 
ements to the absolute values in this scaling situation (lines in 
of an inch, representing 25 scaled feet of elevation, with 100-foot 
emphasized). 
Tables 1 and 2 the standard deviation of the error spread is com- 
d omitting the confusion errors around the 100-foot scaling line, 
are obviously not random. Mistaken numerical interpretations 
the 25-foot scaling line cannot be distinguished easily from random 
but we shall make an attempt, later, to show they exist by statis- 

dividual differences, for untrained subjects on group experiments, 
clear, uniform signals, are presented in Table 1. It appears safe to 


Table 1 
andom Error, Standard Deviation, Group Experiments, Individual Differences 
the following table the subjects are arranged in the order of best to worst, and all 
untrained. The scaling consists of the multiple system with lines a tenth of an inch 


Stand. Stand. Stand, Stand. 
Dev. Dev. Dev. Dev. 
in in Number in in Number 
Scaled Scope of Scaled pe of 
Feet Inches Readings Subject Feet Inches Readings 
2.8 .011 89 R. K. 8. 4.9 .019 89 
4.1 .016 89 C.AW. 5.2 021 87 
4.5 .018 114 C. E. F. 5.5 .022 90 
4.5 .018 90 M.K.S. 56 022 88 
45 018 89 PAW. 58 .023 86 
4.6 .018 88 K. M. 6.0 024 110 
4.6 .018 64 M.S.W. 6.1 .024 88 
4.7 .019 89 DLHE 6.5 .026 
48 .019 63 B.J.J. 7.0 .028 87 
49 .019 90 


: Confusion errors at the 25-foot minor scaling line cannot be accurately sepa- 
from random errors. The above standard errors include these, and are probably 
o large. See Table 3 for an attempt at separation. 


386 Adelbert Ford 


say, from these data, that average intelligent operators should be able to 
report elevation deflections to a standard error of a plus-or-minus 0.020 
inch of scope distance, under such conditions. This represents an error 
in judging the elevation of a plane of five or six feet, presumably trivial. 
Trained subjects are much more nearly alike in error spread, and we 
have combined the runs in Table 2 to show the absolute error under six 
different experimental conditions. 


Table 2 
Distributions of Errors under Various Conditions, Elevation Reporting, 
Multiple Scaling, All Subjects Combined 


For a description of the character of each run, as designated by A, B, C, D, E, and F, 
see page 387 of the text. 


Error, Character of Run 
Scaled 
Feet (A) (B) ©) D) ® ®) Location of Types of Errors 
+110 1 Approximate band of confusion errors 
+105 1 4 around the 100-foot major scaling 
+100 3 8 6 10 line. Errors of overestimation. 
+95 6 3 
+90 1 
+85 1 Approximate band of confusion errors 
+80 1 around the 75-foot minor scaling 
+75 1 line. Errors of overestimation. 
+70 
+65 
g% Approximate band of confusion errors 
+55 1 around the 50-foot minor scaling 
+50 1 line. Errors of overestimation. 
+45 1 
+40 1 
+35 1 i 10 Approximate band of confusion errors 
chou 7 19 around the 25-foot minor scaling 
+25 pide?) 6 13 29 line. Errors of oversetimation. 
ian ERE 90 tas 97 
+15 BES edie) T SOET 
+10 56 of bg rye i Central band of random errors. 
+5 268 370 375 365 262 193 
00 906 902 688 774 414 189 
Z5 236 20 366 366 275 211 
-10 32 34 83 075 170 153 
a e A E EI 447 


s of Errors in Location Judgments on Scaled Surfaces. II 387 
Table 2 (Continued) 


y Character of Run 
A ® © D E N Location of Types of Errors 


yj 4 3 32 7 Approximate band of confusion errors 
7 3 19 27 around the 25-foot minor scaling 
13 line. Errors of underestimation. 
6 


1 4 Approximate band of confusion errors 
around the 50-foot minor scaling 
1 line. Errors of underestimation. 


1 1 
Approximate band of confusion errors 
around the 75-foot minor scaling 
1 line, Errors of underestimation. 
Approximate band of confusion errors 
2 3 around the 100-foot major scaling 
2 3 line. Errors of underestimation. 
1 


43 50 51 53 98 13.3 


six conditions in Table 2 are as follows: 


ition A. Five trained subjects. Individual experiments. Arti- 
ope with clear uniform signals. Rectangular presentation. 
elevation reporting. 

dition B. Nineteen untrained subjects. Group experiments 
a large screen. Same problem materials as Condition A. Rec- 
display. Single-task elevation reporting. 

Condition C. Five trained subjects. Individual experiments. Arti- 
ope with clear uniform signals. Sector presentation. Single- 


ition D. Nineteen untrained subjects. Group experiments 
a large screen. Same problem materials as in Condition C. 
display. Single-task reporting. 
ondition E. Six trained subjects. Individual experiments. Simu- 
reproductions of field radar. Typical pip variations in contour, 


388 Adelbert Ford 
size, brightness, shape, and hazy edges. Sector display. Single-task 
reporting. 


Condition F. Six trained subjects. Individual experiments. Simu- 
lator reproductions of field radar. Same problem materials as Condition 
E. Sector display. Double-task reporting, alternating elevation re- 
ports with range reports. 

The standard deviation of error distributions appears at the base of 
each colunm in Table 2, expressed both in scaled feet and in inches of 
actual scope distance. 

Conditions A, B, C, and D all involve artificial scope pictures with 
clear, uniform signals, The conclusion that an average operator should 
be able to interpret distances, under these conditions, to a standard 
error of a plus-or-minus 0.020 inch is again substantiated. If a radar 
scope could be designed with such clear and uniform pips, and using 
sealing of this degree of fineness, this gives the human expectancy. 

Condition E, using reproductions of an actual radar scope, shows 
that the random error is about doubled, due to signals which vary in 
shape, size, intensity, haziness of edges, etc. In the artificial series the 
reports were ten seconds apart. In this simulator series the operator 
reported every tenth pip, with the scan-line crossing the scope once 
every second. Rate of reporting was approximately the same, therefore. 

Condition F is just like Condition E, except that the operator had to 
keep his attention on two tasks in alternation, elevation reporting and 
Tange reporting. The increase in standard error, from 9.8 feet to 13.3 
feet, represents the effect of giving an operator an additional task. It 
may be presumed that the more tasks the radar operator is required to 
do simultaneously the less accurate he will be on each. This conclusion 
may seem to be something like proving the obvious, but it must be 
remembered that there is a proposal to make one man do what was 
previously done by from 3 to 5 men on GCA radar installations. The 
need for one-man operation is urgent, and the present study is merely an 
attempt to show that multiple tasks must be accompanied by extreme 
work simplification, if we are to avoid intolerable reporting errors. One 
confusion error, of the amount shown in Table 2 at the 100-foot line, 
could wreck an air transport. i 

Figure 6 shows the fit of a normal curve of distribution to the actual 
error distribution for the data of Condition E, reproductions of actual 
radar scopes. (are a eee eeca rt an Oe 


Confusion Errors 


Scales, both linear and surface types, consisting of major lines with 
numerical values, and minor divisions which are supposed to assist in 


. 


of Errors in Location Judgments on Scaled Surfaces. II 389 


‘OBSERVED i ‘NORMAL CURVE — 
FREQUENCY >, ` SAME AREA AND 

1 A STANDARD 

DEVIATION 


` 


-35 -30 -25 -20 -I5 -IO -5 O 5 10 15 20 25 30 35 
NEGATIVE ERRORS POSITIVE ERRORS 
SCALED FEET— | FT.=.004 SCOPE INCHES 


Fia, 4. Type of fit for normal curve when errors at 25 ft, the position of 
` a minor scale division, have been included. 


olation, are subject to mistaken interpretation of figures and errors 
unting division points. 
ble 2 shows a clear existence of mistaken interpretation at the 
t value. This is verified by subjective reports, many times. The 
‘oot line is called a 200-foot line, or the line of zero reference is mis- 
or a 100-foot side line. There was no case of an error as great as 
t, but it was theoretically possible. 
Also, at the 75-foot, the 50-foot, and the 25-foot distances there is an 
probability of assigning wrong numerical interpretations. These 


NORMAL CURVE 
FITTED TO CENTER 
FIVE STEPS 


Ñ CONFUSION ERRORS 
\ NEAR +25 FT. 
SCALING LINE 


_ OBSERVED FREQUENCY 


_ CONFUSION ERRORS 
= NEAR -25 FT. 
SCALING LINE 


“35 -30 -25 -20-15 -I0 -5 O 5 10 15 20 25 30 35 
NEGATIVE ERRORS POSITIVE. ERRORS’ 
SCALED FEET— | FT.=.004 SCOPE INCHES 


Hypothetical improvement of normal curve fit when errors at the 25 foot 
position have been excluded. Presented to explain the x* improvement shown 
3. 


390 Adelbert Ford 


are fairly clear at 50 feet and up. Unfortunately the confusion errors 
at the value of 25 feet overlap with the curve of random error. In fact, 
there is no way of separating confusion from random errors, at this 
position, but there may be a statistical way of showing facts which 
support the belief that they must be there. 

Assuming that random error distributions should approach the curve 
of normal probability, an hypothesis which has considerable support, and 
that systematic errors will cause typical and expected distortions from 
normalcy, we may resort to the x? test for these data. And in this use of 
the Fisher technique, it isn’t just the bald fact that a misfit has occurred, 
but where in the curve the misfit is found, whether or not it is over the 
values which correspond to the minor or major scale points, that should 


OBSERVED 
FREQUENCY 


NORMAL — 
PROBABILITY CURVE- 
EQUAL AREA AND 

STANDARD DEVI- 
ATION 


-30 -25 -20°-15"-10" -51 0 
ERROR IN SCALED FEET AEA ENEE e | tc. 
Fie. 6. Normal curve fit—random errors. Elevation reports in double task experiments. 


r 


of Errors in Location Judgments on Scaled Surfaces. II 391 


interest in spying out the presence of confusion errors mixed 
| random errors at the 25-foot distance. 

Figure 4 shows the typical result we get when we try to fit a normal 
on our error distributions. The normal curve is plotted using the 
d deviation of the distance from —35 to +35 feet, which includes 
jon errors around 25 feet. 

e x? test always resulted in too many errors over the 25-foot position, 
the discrepancy was always positive for every distribution beginning 
Condition A through and including Condition E. This always 
ced the appearance of a leptokurtic hump at the center. 


Table 3 
Artificial Scope Runs 
x? Tests of Curve Fit for a Normal Distribution of Error, 
Central Band of Random Error 


dered as being the band for pure random error (see Table 2), and that this region 
d fit a normal probability curve. The fit to the central band is tried two ways: 
the supposed confusion errors included, and with the confusion errors excluded, 
by computing the standard deviation only on the central band. 


Stand. Stand. Number of 
Dev. x? Fit Dev. x? Fit Readings, 
—35 to Central —15to Central Central 
+35 feet Band +15feet Band Band 
4.31 167.25 3.40 80.19 1499 
5.10 24.28 4.65 5.89 1588 
5.01 225.97 3.70 48.67 1613 
5.30 97.01 4.40 9.92 1638 
9.80 39.79 8.50 23.27 1280 


13.33 1.78 — — — 


p intervals. No attempt was made to improve the fit because this was already 
od as could be obtained. The area of this central band was 99.5% of the total 
bution. 


Figure 5 shows our hypothesis of what would happen if we deter- 
the standard deviation by the central band of random error, only, 
deliberately assumed that the excess of readings over the 25-foot 
int is due to confusion errors, not random errors. 

Therefore, we adjusted the standard deviation value to fit the central 
of error values, from — 15 to +15 feet, and applied the x? test again. 
differences between the two assumptions are shown in final x? 
ers in Table 3. Without exception, for the artificial scope series 
inclusive, a x? fit for 98% of the readings was greatly improved. 


392 Adelbert Ford 


The astonishing thing was the discovery that the distribution for Condi- 
tion F, double task reporting, was already almost a perfect normal curve, 
and could not be improved by any assumptions of systematic distortion. 

We are inclined to believe, therefore, that the approximate bands 
for the regions of confusion errors in Table 2 are essentially correct. This 
means that, in reducing the random error by more finely divided scales, 
we have introduced an intolerable numerical confusion error, extremely 
dangerous for the practical navigation of aircraft by ground control radar. 
Therefore, no recommendation is made to use such a scale. More simpli- 
fied methods of signal tracking must be designed, especially for one-man 
operation. 


Persistence Errors 


A rather broad definition of a persistence error may be: It is the 
tendency of an operator to bias a present report because of the mental 
persistence of a previous report. 

We uncovered the existence of this possibility through two subjects 
whose data are plotted in Table 4. The first evidence was a sort of verbal 
stereotyping occurring when operators had to attend to two things 
alternately, Table 4 is drawn from the double-task experiments of 
Condition F. 


Table 4 
Distributions of Errors 
Range Reports 
Reproductions of GCA Field Radar Scope 
Error 
in Initials of Subjects 
Scaled 
Miles R.C. D.M. D.M.S. J.H.F. W.A.S. C.B. Total 
+1.0 1 1 Band of 
ta 2 2 persistence 
I errors 
+6 1 1 
+5 1 1 2 
+4 
bes Band of ran- 


dom and con- 
fusion errors 


be tubers 50 68 46 49 70 
-2 14 5 2 13 


8 
s 
8 
| 
R 
8 
3 
~# 838 


of Errors in Location Judgments on Scaled Surfaces. II 393 


operator would be reporting consecutive range values, ‘six-point- 
x-point-one, six-point-zero,” and when he passed into the five- 
ne he went on, “‘six-point-nine, six-point-eight,” and then suddenly 
ked, “Oh, I meant five-point-eight.”’ This is essentially the situa- 
' Table 4. 
led us to wonder if something similar to this might not have been 
to susceptible subjects, in the previous elevation serial re- 
Therefore, we computed the algebraic mean of errors following 


Table 5 
Trend of Algebraic Mean Error in Relation to Previous Report 
Elevation Scale 
His sign means that the subject tended to veer his reports in the direction of the 
report. A minus sign means that the subject tended to bias away from the 
ing report. The calculation is the difference in means where the preceding report 
as compared with readings where the previous report was lower. Figures in 
better than the one per cent level of significance. Differences are in scaled feet. 


1. Group Experiments, Artificial Scope 
Difference Subject Difference Subject Difference 


+2.38 W.J. K. +.32 N.J. R. +.79 
—.24 L. E. K. +.18 R.K.S. +2.05 
+.45 K.M. +1.56 R. B.T. +1.91 
+.88 A. W. R. —.61 C.A. W. —1.37 
+1.18 A.P.R. —.28 M.S. W. +.21 
+1.07 
2. Individual Experiments, Artificial Scope 
+1.05 F. P. H. —.31 D. M.S. +.84 
+1.02 R.J.R. +1.01 


3. Simulator Reproductions of Field Radar 


J.H. F. +3.07 W. A.S. +4.60 
D. M.S. +.80 


previous values, and subtracted this from the algebraic means of 
s following smaller previous reports. This difference is susceptible 
culation for reliability of differences of means. Table 5 shows the 
s of this survey. Although only six out of twenty-six subjects 
d a significance of difference better than the one per cent level, 
neral preponderance of plus values (20 out of 26) may carry some 


ting that some subjects are susceptible to this effect, the size of 


394 Adelbert Ford 


the error trend is actually too small to be of any serious consequence for 
the practical control of aircraft. A biasing effect of two feet, or even 
five feet, would not be intolerable. On range reporting it is conceivable 
that a mistake of one mile might be serious. 


Summary 


1. The use of finer scaling, with minor scale division to tenths of an 
inch viewed at sixteen inches, reduces random errors to a standard devia- 
tion of a plus-or-minus 0.020 inches of scope distances, for clear uniform 
pips, and 0.040 inches of scope distance for reproductions of actual 
radar pips. 

2. The introduction of this finer scaling produces a proneness for 
confusion errors, defined as misinterpretation of the numerical values of 
scale positions. These errors may reach such a size as to endanger the 
navigation of an aircraft being guided by such operating reports. 

3. Requiring an operator to alternate between two tasks in rapid 
succession has the effect of increasing the size of the random error, in our 
situation, about 30%. 

4, Some subjects have a tendency to bias each report in a series by 
the mental persistence of the previous report. Only a minority of 
subjects do this consistently, and the amount is relatively small for 
practical significance. 

5. Fine scaling, for one or more variables, is not recommended on the 
basis of present data for radar scopes. 


Received April 18, 1949. 
Early ee 


References 


1. Garner, W. R. Some Perceptual Problems in the Use of VG Remote PPI, Report of 
Research under Contract with the Office of Research and Inventions, U S Navy, 
166-I-2, 15 September 1946. Restricted. The Johns Hopkins Psychological 
Laboratory. P. 34. 

2. Ford, A. and Getz, M. H. Types of Errors in the Reading of GCA Scaled Scopes, 
Technical Report No. 4; Contract W28-099-ac-130, Watson Laboratories, Air 
Materiel Command, USAF, 31 August 1948, Restricted. 


F. 


n 


ne Design Factors in Making Settings on a Linear Scale * 


William Leroy Jenkins and Minna B. Connor 
Lehigh University 


In setting a pointer on a linear scale by means of a control knob, is 
ere an optimal ratio between pointer movement and knob turn? Is 
e an optimal knob diameter? When is a crank handle better than a 
‘knob? What is the effect of backlash in the system? No previous system- 
f àc investigation of such design factors seems to have been made. 

_ The present study deals with a situation in which the operator is 
equired to match a designated position on the scale with his pointer, 
ther than to set it to a specified numerical value. This limited phase 
of the problem was chosen because it permits data to be gathered rapidly 
d allows the accuracy of the setting to be objectively checked. 

The primary criterion employed is the time consumed in making a 
‘setting, since time is comparable from subject to subject, and from condi- 
tion to condition. In many of the experiments, action potentials from 
the active forearm were also picked up and measured. However, action 
potentials cannot be compared from subject to subject, since it is not 
i known that the efficiency of the pick-up is the same in all subjects. 
For any given subject they do provide at least a rough indication of the 
relative amount of muscular work involved under different conditions. 


Apparatus 


Figure 1 is an operational diagram of the essential mechanical features 
the apparatus. Rotation of the control knob turns the lower set of 


Different ratios are obtained by shifting the belt. When the clutch is 
engaged, rotation of the upper shaft turns the drum and thus moves the 


The linear scale consists of a black bakelite bar with vertical inserts 
of lucite 032” wide at distances of 3, 9, 15, 21, 27, 33, 40, 56, 72, and 88 
_sixteenths of an inch pema from the center. Behind each 


* This research was executed under Contract No. W28-099-ac-130 between the Insti- 
tute of Research, Lehigh University, and the USAF Air Material Command, Watson 
Laboratories, Red Bank, N. J. 
i 395 


396 William Leroy Jenkins and Minna B. Connor 


Through the center of the linear scale runs a thin metal strip which 
is used in checking the accuracy of setting. The pointer can be tipped 
to come in contact with the scale. If the pointer is entirely within the 
limits of a lucite insert, it will not touch the metal strip. If it is off the 
insert either way, it will come in contact with the metal strip and cause 
a red pilot lamp to light. The limit of error-tolerance is thus established 
by the width of the pointer. 

The mechanical system is without backlash and is so adjusted that 
the pointer remains exactly where it was set after the clutch is released. 
To maintain these conditions, the belts must be quite tight; so that there 
is noticeable resistance at extremely coarse ratios. With the mechanical 
advantage of ratios in the medium and finer ranges, however, the opera- 
tion requires very little effort. 


LINEAR SCALE 


ooo oom 
ESOT ORS TT | 


©) 


Frc. 1. Mechanical features—operational diagram. 


For measuring time, two chronoscopes are used; so that time for 
travel to approximate location and time for making the final adjustment 
can be separately determined. Similarly, two condensers are used to 
accumulate amplified action potentials during the travel and adjust 
phases. (Details of the electrical circuits and the four-stage amplifier 
will be found in the Technical Summary Report of the project.)! 

1 Jenkins, W. L. and Connor, M. B. Optimal Factors for Making a Setting on & 


Linear Scale, Technical Report No. 3, Contract W28-099-ac-1 i 
i ile; . 30, Watson Laboratories, 
Air Material Command, USAF, 30 June 1948, Restricted. 


Design Factors in Making Settings on a Linear Scale 307 


Procedure 


‘procedure was essentially the same for all experiments. During 
ical two-hour experimental session six or seven runs can be com- 
Each run consists of a series of 20 settings, involving all 20 of the 
inserts in a scrambled order. The procedure for’a single setting 


After giving a preliminary warning signal, the experimenter closes 
itch which simultaneously: (a) lights a pre-selected lucite insert; 
starts both chronoscopes; (c) begins the accumulation of amplified 
| potentials in the first condenser. i 

As soon as he sees an insert light up, the subject starts turning 
ob to bring the pointer from the center of the scale to the designated 
ion, When the pointer comes within one tenth of an inch of the 
ited insert, a contact is automatically closed which simultaneously: 
) stops one chronoscope; (b) switches the accumulation of action 
tials from the first to the second condenser. Thus the first chrono- 
and the first condenser provide measurement of the TRAVEL 
and potential. 

When the subject has completed his setting, he pushes the clutch 
with his non-operating hand. This action simultaneously: (a) 
the second chronoscope; (b) cuts the second condenser out of the 
uit. Thus the second chronoscope and the second condenser provide 
ADJUST measurements. 


the pointer against the scale. (Errors occur so rarely that the 
occasional “red light” reading is simply discarded.) The experi- 
iter records the readings of both chronoscopes, and discharges each 
er separately into a sensitive ballistic galvanometer. The appa- . 


if Method of Analyzing Data 

he raw data are in the form of time readings in tenth-seconds and 
otential readings in arbitrary meter-scale units, for the travel 
the adjust phases of each setting. The adjust readings cause 
ficulty because they can be averaged directly. However, travel 
vary according to the distance of the insert from the center. 
ravel readings are first plotted against distance traveled and a 
t line fitted. (The slope of this line is actually the travel rate, and 
tercept an estimate of the starting time or potential.) Then the 
vel time (or potential) is scaled off for two standard distances: 
sixteenths and 50 sixteenths of an inch. (The former is probably 
€ representative of the usual amount of movement required in making 


398 William Leroy Jenkins and Minna B. Connor 


discrete adjustments.) Mean total time (or potential) = mean travel 
+ mean adjust. 


Subjects 


Two former Navy radar operators (DMS and HWQ) were used in 
all of the experiments. Two other young men (JDS and RFM) with 
no such prior experience were available only for certain parts of the study. 
These four subjects are right-handed. The young woman (JKD) used 
in the study is naturally left-handed but was required to make settings 
with her right hand. She also had had no particular mechanical back- 
ground, 


Table 1 
Influence of Ratio on Time and Potential 
Standard Conditions 
u LLŘĚ 
Mean Total Time 
10 Sixteenths Travel 50 Sixteenths Travel 
Ratio DMS HWQ JKD RFM DMS HWQ JKD RFM 
o A ET On rat aaa N ta el a 
«220 25.2"  29.0* 24.0* — 75.6* 66.6* 53.6* a 


454 17.5 241° IEAS, 35.1* 89.5" 42.9% 37.9% 72.3* 
-766 18.0 22.6 224° 32.2 31.2" 35.4% 32,4* 52.7* 


1.18 163 195 194 3:08 248 287 258 448 
2.42 19.1* 21,6" 220° 291 27.1% 26.0% 25.6 38.7 
4.08 19.2 202 23.9° 35.4 23.6 26 27.9 42.2 
6.28 19.5* 23.1° 26.7° 37.3* 23.5 27.5% 30.7* 43.3 
9.70 23.8* 253° 281° 37.3* 26.6 28.9% 32.5* 42.1 

16.3 82.8* 33.3% 37.2% 47.4* 34.4* 86.5" 42.4" 52.2" 

33.6 ue è — 6s è — 57.9 — 73.0 = 

a O a sal 

Mean Total Potential 
10 Sixteenths Travel 50 Sixteenths Travel 


OLSEN iraa N 
Ratio DMS HWQ JKD RFM DMS HWQ JKD RFM 
ret hein I tien chs tte 


220 243° 20.9% 269% — 71% 73.7 57.38% ë — 
454 168* 20.8 195  27.3¢ 41.6*  46.8* 36.7% 64.5° 


-766 15.3 195 190 221 28.5" 35.1% 29.4 42.9* 
118 144 197 28 28 23.2 281 28.3 36.8 
242 Wi 164 212 175 25.1 22.6 26.8 26.3 
4.08 165° 184 205 205 21.3 28 249 26.6 
6.28 Ie 164 219 258 221 218 27.5 306 
9.70 19.7" 184 22.6% 266 28 220 27.0 302 

16.3 9° 24° 205" 334 26.9% 250 343* 378 
33.6 254° — 38.3* = 28.1 — 43.9 -a 
er 


— da. eu 


Design Factors in Making Settings on a Linear Scale 399 


Standard Conditions 
following conditions were standard in all experiments, unless 
ic exception is noted: 


scale—At eye level and normal reading distance, 

l knob—At waist level of seated subject; right-hand operation; 

” diameter knob. 

erance—.007” (pointer width of .025’’) 

l; Expressed in inches of pointer movement for one complete 
turn of the knob. 


an total time is expressed in tenth-seconds for 10 sixteenths or 50 
s travel distance. Mean total potential is expressed in meter- 
readings which have no absolute significance but are comparable 
ferent conditions in the same subject. Each mean is based on 
um of 80 readings. In tables showing italicized values, an 


RATIOS 
o a 2 om & 
& 8 2 $% 32 AR 
r AN i g = | EAE T] 


ADJUST, N 


Ooms AHWO OuKO 
Fia, 2. Influence of ratio—standard conditions, 


400 William Leroy Jenkins and Minna B. Connor 


asterisk (*) indicates figures which differ significantly from the itiali- 
cized values, beyond the 1% level of confidence. 


Results 


Influence of Ratio. Is there an optimal ratio? Table 1 shows mean 
total time and’ mean total potential for ten ratios varying from .220 
to 33.6 inches of pointer movement for one complete turn of the control 
knob. Although the subjects differ in their general levels, it is evident 
that the optimum is in the neighborhood of 1.18 in terms of both time and 
potential. 

Figure 2 shows why the optimal ratio is in this region. For all 
subjects, travel time declines rapidly with increasing coarseness to about 
1.18; thereafter coarser ratios do not speed up travel materially. In the 
opposite fashion, adjusting time declines with decreasing coarseness of 
ratio to about 1.18; thereafter finer ratios do not aid in making the final 
adjustment. A ratio about 1.18 combines rapidity of travel with speed 
of final adjustment. 

For convenience in the remainder of this report we shall refer to 1.18 
as “the optimal ratio.” This should not be interpreted too literally. 
Actually there is an optimal region which holds good for all the subjects 
tested. Well-practiced subjects can use coarser ratios without undue 
loss, but the ratio designated as optimal has proved satisfactory for 
novice and expert alike. 4 


Table 2 
Stability of the Optimal Ratio 4 
oma conditions except that Feb. ’47 figures were obtained with a 2” diameter 


Mean Total Time 
A 10 Sixteenths Travel 
Subject: DMS Subject HWQ 
AAE E AU 

Ratio Feb. Apr. May Oct. Mar. Ratio Feb. Apr. May Oct. Mar. 
147 AT 747 47 0'48 147 47 47 47 %8 
220 28.1 — 206 25.2 — 220 290 — 339 290 — 
454203 — 224 175 — 454 224 — 28 221 — 
766 183 — 20.7 180 232 766 204 — 25.7 22.6 27.6 


118 16.9 202 184 163 205 118 203 243 237 19.5 223 
242 167 241 221 191 23 242 203 289 231 216 21.1 
408 184 26.5 22.7 192 226 408 20.7 262 253 202 23.5 
6.28 198 27.7 217 195 28 698 24 333 285 231 255 
970 218 290 — 238 272 970 257 346 — 253 314 
103 iaso d sas 3 1168) Laney i 383 — 


Design Factors in Making Settings on a Linear Scale 401 


dication of the stability of the optimal ratio over a period of time 
ted in Table 2, which shows data for two subjects gathered on 
nt occasions over a period of thirteen months. Although the 
performance fluctuates from time to time, the optimal ratio 
in the same region. 

ble 3 shows that the optimal ratio holds good for both the dominant 
m-dominant hand. (To obtain these figures, a left-hand and a 
-hand knob were coupled with auxiliary belts, so that the pointer 
be set with either hand.) Particularly interesting here are the data 
ubject JKD. Although naturally left-handed, JKD had by this 
become well-practiced in right-handed operation of the apparatus. 
favorably high ratios she was now able to make faster settings with 
ht hand. Around the optimal ratio, the two hands were equally 


Table 3 
Ratios in Right vs. Left-Hand Operation 
d conditions except that identical right and left hand knobs were coupled 
elt so that either could be used. 


Mean Total Time 
10 Sixteenths Travel 
DMS HWQ JKD 
Right Left Right Left Right Left 
22.2 24.4 25.5 29.5 25.0* 24.9 
21.3 24.6 24.8 28.0 22.6 23.6 
21.0 24.3 24.7 26.1 24.7 22.4 
23.7* 25.0 25.1 26.4 27.6* 30.3* 
26.0* 29.3* 29.8* 31.5* 27.7* 33,7* 
29.2* 38.0* 37.6* 38.6* 31.6* 36.4* 


Significantly different from ratio 1.18. 


uence of Knob Diameter. In a preliminary study on two sub- 
ourteen knob diameters were tested with five different ratios. 
clarity in presentation the fourteen diameters are grouped in five 
intervals. Table 4 gives the mean total time for 10 sixteenths travel _ 
nee. Several points of interest appear: (1) Regardless of knob 
er, the optimal ratio remains in the neighborhood of 1.18. (2) It 
parently not possible to compensate for an unfavorable ratio by 
g the size of the control knob. Notice that the fastest times for 
28 are longer than the slowest times for ratio 1.18. (3) With 
ratios the larger knob diameters work better. (4) At the optimal 
knob diameter appears to make very little difference. 


a 


402 William Leroy Jenkins and Minna B. Connor 


As a check on this last point, five knob diameters were studied at the 
optimal ratio, using four subjects. Table 5 shows the results for both 
time and potential. In terms of mean total time, only the half-inch... 
diameter is clearly unfavorable for all subjects, and the one-inch diameter 
mildly so for two of them. In terms of action potential, the 234” 
diameter is significantly superior to the smaller sizes, although not always 
to the 4” diameter. 


Table 4 


Interaction of Knob Diameter and Ratio 


Standard conditions except that series of knob diameter were combined with series 
of ratios as indicated, 


Mean Total Time 
10 Sixteenths Travel 
Subject HWQ 

Knob Ratio Ratio Ratio Ratio Ratio 
Diameters 1.18 2.42 4.08 6.28 9.70 
1, % 29.2 = = 46.4 — 
1, 14, 1% 24.1 26.8 26.8 35.1 40.2 
1%, 2, 2% 22.6 25.3 25.6 31.2 34.2 
214, 2%, 3 1 23.6 27.0 25.7 31.6 33.0 
3M, 34, 4 24.3 27.3 25.0 i o 308 30.7 

o T Éa U A RAAT E ha a O NA 
Subject DMS 

Knob Ratio Ratio Ratio Ratio Ratio 

Diameters 1.18 2.42 4.08 6.28 9.70 
o Se rh A dec i 

1, % 21.5 — — 34.1 — 
1,1, 1% 21.5 24.3 30.0 87.5 33.7 
194, 2, 24 22.5 22.2 26.6 34.5 28.3 
214, 234, 3 21.6 22.5 26.3 28.4 29.6 
8%, 334, 4 23.2 22.4 25.7 29.9 27.2 


j Figure 3 shows travel time and adjusting time separately. The half- 
inch diameter yields longer times for both travel and adjusting in all 
subjects. Among the larger sizes there is little to choose. It appears 
that the critical motion is the twist of the forearm, not the movement of 
the finger tips. Practically speaking, as long as the optimal ratio is used, 
the exact knob diameter does not matter, unless it is too small or too 
large to be grasped conveniently. The standard 234” size used in most 
of our experiments was adopted simply because most subjects expressed 
a preference for this size. 

Influence of Crank Handle. Cranks are generally used in tracking 
operations. The question has been raised whether a crank is better than 
a knob for making discrete settings involving large amounts of travel. 


Design Factors in Making Settings on a Linear Scale 403 


Table 5 
Influence of Knob Diameter at Optimal Ratio 
d conditions except that series of knob diameters were combined with ratio 


Mean Total Time 
10 Sixteenths Travel 50 Sixteenths Travel 


DMS HWQ JKD RFM DMS DWQ JKD RFM 


25.3% 28.1% 26.3% 42.1% 35.3" 38.1" 35.5"  53.7* 
23.1 23.0* 22.0 39.3* 31.5 29.4 29.6 46.1* 


21.1 22.9* 23.0 35.2 30.3 29.3 29.4 44,0* 
21.9 20.8 22.1 84.5 28.7 28.0 27.8 88.9 
21.2 21.8 21.8 37.6 27.6 29.4 26.6 46,6* 
Mean Total Potential 
10 Sixteenths Travel 50 Sixteenths Travel 


DMS HWQ JKD RFM DMS HWQ JKD RFM 


81.4* 29.2* 38.0* 33.4% 44.6" 40.0* 50.8"  46.2* 
30.9% 24.2 33.0*  27.1* 44.1* 35.0 44.2" 35,9" 
26.0* 25.5* 27.6 22.3* 38.4* 36.3% 39.2" 32.3* 


23.4 22.6 26.0 18.5 33.0 83.8 86.8 22.5 
21.7 26.7* 24.9 13.6 31.7 37.5 35.2 19.6 
* Significantly different from 234. 


KNOB DIAMETER 
72" " a 2-3/4" 4" 


TRAVEL 


ODMS AHWQ OJKD 
Fia. 3. Influence of knob diameter—standard conditions. 


404 William Leroy Jenkins and Minna B. Connor 


To study this problem the 234” knob was drilled so that a crank handle 
could be attached 14” from the periphery. Time measurements were 


taken at seven ratios under the following conditions: (1) Knob alone as ~ 


a control; (2) crank attached and its use required; (3) crank attached 
but its use optional. 

Table 6 shows mean total time for 50 sixteenths travel distance, 
which should give the crank the maximum advantage. Two interesting 
points appear: (1) Although the crank speeds up setting at ratios below 
1.18, it does not enable these ratios to compete with the optimal ratio 
and the simple knob. (2) At the optimal ratio, the forced use of the 
crank is definitely deleterious and even its mere presence appears to 
hamper the best performance. Within the limitations of these experi- 
ments, at any rate, it appears that a crank handle serves no function 
whatever in making discrete settings on a linear scale. 


Table 6 
Comparison of Knob and Crank 
Standard conditions except each mean based on a minimum of 40 readings. Crank 
simulated by attaching crank-handle to periphery of 234” knob. In the table: KNOB 
means knob alone; CRANK means use of crank required; OPT means crank-handle 
present but use optional. - 


Mean Total Time 
50 Sixteenths Travel 
Subject: DMS Subject: HWQ Subject: JDS 


Ratio KNOBCRANK OPT KNOBCRANK OPT KNOBCRANK OPT 


220 81.2 526 54.8 73.5 58.5 55.1 103.6 50.1 52.8 
454 527 35.6. -36.7 48.0 42.5 39.9 64.6 40.1 38.7 
766 = 87.7 30.2 310 40.3 399 35.4 45.3 334 29.0 
1.18 25.6 32.7 325 30.6 38.6 31.7 29.0 34.3 32.7 
2.42 26.0 33.5 26.7 278 398 36.2 298 362 321 
4.08 268 458 296 30.0 45.6 34.0 29.0 44.0 32.0 
6.28 24.6 438 291 32.8 61.7 32.7 31.2 43.7 33.1 


Influence of Backlash. Backlash is unavoidably present in some 
equipment. What is its influence on the speed of making settings? To 
study this question, the apparatus was modified by the addition of an 
arm moving between adjustable stops immediately beyond the subject’s 
control knob, so that varying degrees of backlash could be introduced. 
In a preliminary series with two subjects, backlash was tested in 1° 
steps from 0° to 20° in the expectation that some particular amount of 
backlash might prove to be critical. Since this expectation was not 
realized, the figures have been grouped into seven step intervals. Table 
7 shows mean total time for 10 sixteenths travel at ratios 1.18 and 6.28. 


Design Factors in Making Settings on a Linear Scale 405 


ly, backlash appears to have very little effect, even at the 
bly coarse ratio of 6.28. 

a further check, backlash of 0°, 4°, 8°, 12°, and 16° was tested with 
subjects using the optimal ratio. E are given for mean total 
mean total potential in Table 8. Again it seems that no sub- 
effect of backlash can be found in either time or action potential. 
e is a slight upward trend with increasing backlash, but the statis- 
ly significant differences are scattered spottily and unconvincingly 
ughout the table. Figure 4 indicates that the slight upward trend 
3s from a minor lengthening of adjusting time, while travel time 
s unaffected. 


Table 7 
Interaction of Backlash and Ratio 


dard conditions. Varying degrees of backlash introduced by means of an arm 
between adjustable stops, immediately beyond subject’s control knob, 


Mean Total Time 
10 Sixteenths Travel 
Subject: DMS Subject: HWQ 
Ratio Ratio Ratio Ratio 
1.18 6.28 1.18 6.28 
23.1 27.8 24.4 29,2 
23.2 30.1 24.9 28.1 
23.8 32.5 25.8 28.7 
25.4 33.0 26.4 30.1 
25.1 -82.7 26.4 32.2 
26.1 32.5 26.2 30.7 
26.5 33.3 26.6 29.7 


We are reluctant to draw the sweeping conclusion that backlash is 
ly unimportant under all conditions. Perhaps with excessive friction 
ertia, perhaps when far greater accuracy than .007” is demanded, 
sh may prove more disturbing than in the present experiments. 
are questions for further research to answer. 

Influence of Error-Tolerance. How much does it slow up an operator 
demand greater accuracy in setting? In our apparatus the error- 
ce could be altered simply by changing the width of the pointer 
tion to the width of the lucite inserts. In a preliminary series, 
n pointer-widths were tested. Table 9 shows the results in terms 
n total time for 10 sixteenths travel distance. At the optimal 
only subject DMS shows a marked lengthening of time with de- 
g tolerance; but at ratio 6.28 all three subjects show the same effect, 


Standard conditions. Varying degrees of backlash introduced by means of an arm ~~ 


William Leroy Jenkins and Minna B. Connor 


Table 8 


Influence of Backlash at Optimal Ratio 


working between adjustable stops immediately beyond subject’s control knob. 


Mean Total Time 
ma 10 Sixteenths Travel 50 Sixteenths Travel 
lash | DMS HWQ JKD DMS HWQ JKD 
ee 
None 21.9 22.9 23.7 81.1 30.9 31.7 
4° 22.0 23.8 23.4 30.4 31.0 31.4 
8° 23.4 25.5* 26.6 32.6 33.5 34.6 
12° 24.2* 24.1 28.6* 34.2* 81.7 37.4* 
16° 26.8 24.5 26.6 36.4* 33.3 34.6 
eee aaaea EA e ed 
Mean Total Potential 
B 10 Sixteenths Travel 50 Sixteenths Travel % 
a ESLA ra arat ra 
lash DMS HWQ JKD DMS HWQ JKD 
None 25.7 28.9 82.9 88.9 86.7 43.7 
4° 28.0* 24.2 31.2 40.8 36.2 42.0 
8° 26.4 26 6* 32.7 39.2 370 45.5 
12° 26.7 25.3 33.9 39.5 37.3 46.3 
16° 29.0*  26.5* 32.7 42.2 37.7 45.5 
* Significantly different from None, 
BACKLASH 
o° 4° 8° 12° 16° 
20 
w 
= 
e 
if} 
(0) 


O DMS 


AHWQ OJKD 
Fia. 4.—Influence of backlash—standard conditions, 


Design Factors in Making Settings on a Linear Scale 407 


Table 9 


Interaction of Tolerance and Ratio 
rd conditions except that knob diameter is 2”, 


Mean Total Time 
10 Sixteenths Travel of 

Subject: DMS Subject: HWQ Subject: JDS 
Ratio Ratio Ratio Ratio Ratio Ratio 
1.18 6.28 1.18 6.28 1.18 6.28 
17.0 17.8 19.5 21.7 — — 
16.5 21.0 20.7 23.2 22.1 27.9 
18.2 24.4 22.7 27.7 22.7 30.1 
24.2 28.6 22.7 30.0 24.2 30.0 
26.5 37.1 24.1 29.5 29.9 33.4 
30.0 52.1 25.2 33.2 32.7 39.9 
35.3 50.2 29.0 39.2 33.9 40.5 


” 
optimal ratio, measuring both time and potential. Table 10 gives 
results. There is evidence of a moderate lengthening of time from 
" to 005”; then a sharp break at .003’, From the reports of the 
bjects, it appears that .003” represents a breaking-point at which it 


Table 10 


f Influence of Tolerance at Optimal Ratio 
Standard conditions except that series of error-tolerances were tested at ratio of 1.18. 


Mean Total Time 
10 Sixteenths Travel 50 Sixteenths Travel 


DMS HWQ JKD RFM DMS HWQ JKD RFM 
15.8*  19.0* 16.6* 27.9% 22.8 25.4* 23.0 38.3* 


17.1 19.5* 18.3 31.4 23.9 26.3 24.7 40.2 
17.5 22.6 19.8 84.6 24.7 27.8 25.0 45.0 
20.7* 23.4 21.8* 38.1 27.9 31.0* 27.4 48.9 
27.7* 30.4% 25.9* 51.6* 33.3" 37.2" 32.3" 61.6* 
Mean Total Potential 
10 Sixteenths Travel 50 Sixteenths Travel 

DMS HWQ JKD RFM DMS HWQ JKD RFM 
141* 143* 214* 19.6 22.1* 22.7 30.6 30.0 
14.9 15.7 22.0* 19.5 23.2 22.9 31.6 29.9 
15.9 15.8 24.8 21.6 24.3 28.0 83.6 82.8 
17.2 19.6* 23.4 23.4 25.2 27.2* 31.8 34.6 


21.5* 23.7* 27.6*  27.2* 29.0* 30.5* 36.4 37.6* 


— =g a b 


William Leroy Jenkins and Minna B. Connor i 


408 
becomes perceptually impossible to judge whether the pointer is accu- 
rately positioned. This is borne out by the fact that only at this level of 
tolerance did the subjects have an appreciable number of “red lights’! - 
(indicating that the chitch was released when the pointer was not within 
the confines of the lucite insert). 

Figure 5 shows, as might be expected, that error-tolerance does not 

time. Adjusting time increases slowly as tolerance do- 

upward break at .003”. 


Í 


TOLERANCE 


Factors in Making Settings on a Linear Seale 109 


optimal ratio is one or two inches of pointer movement for one 
ite turn of the knob, for either the dominant or non-dominant 
ratios waste time and effort in traveling to the approximate 
Coarser ratios are clumsy for making the final adjustment. 
design factor investigated is as important as the optimal ratio, 
ob diameter is relatively unimportant, as long as the knob is 
t enough to be grasped conveniently, An unfavorably coarse ratio 
jot be compensated for by altering the size of the control knob, 
. unfavorably fine ratio cannot be compensated for by sub- 
a crank handle for the control knob, When the optimal ratio 
ed, the addition of a crank handle to the knob does not ald and 
actually harmful, even when its use is optional. 
4. B h, even in excessive amounts, has a relatively minor in- 
ioe on either time or potential at the optimal ratio—under the condi- 
w of this experiment. This may not be true under conditions of ex- 
ae friction and inertia, or when a tolerance much finer than .007” is 


6. Demanding greater accuracy of the subject by reducing the per- 
Hed error-tolerance increases time and potential only moderately, as 
ng As the optimal ratio is employed. The final limit of accuracy in the 

it experiments appeared to be set by the perceptual difficulty of 
tering a pointer of appreciable thickness on a lighted insert, rathor 
y tho limits of motor control. 


fata 


Book Reviews 


Lewin, Kurt (Edited by Gertrude Weiss Lewin). Resolving social con- 
flicts. Selected Papers on Group Dynamics. New York: Harper 
and Brothers, 1948. xviii-230. 

In his Foreward Gordon Allport writes such an excellent review of 
this book that the temptation to quote him liberally is too strong to be 
resisted. The thirteen papers, all previously published elsewhere are, 
he says, “so well-selected and so adroitly arranged that they provide an 
excellent introduction to Lewin’s system of thought” (p. XIV). “The 
unifying theme is unmistakable: the group to which an individual belongs 
is the ground for his perceptions, his feelings, and his actions. Most 
psychologists are so preoccupied with the salient features of the indi- 
vidual’s mental life that they are prone to forget it is the ground of the 
social group that gives to the individual his figured character. . . . This 
interdependence of the ground and the figured flow is inescapable, inti- 
mate, dynamic, but it is also elusive” (p. VII f.). 

“Lewin’s outstanding contribution is his demonstration that the 
interdependence of the individual and the group can be studied in better 
balance if we employ certain new concepts. Although the present volume 
contains primarily papers having a concrete, case-anchored character, 
still each shows with clarity how fruitful these new concepts are for under- 
standing the phenomenon in question” (p. VIII.). Here, I think, we 
must be more cautious than Allport. We do not, it is true, quite share 
the objection sometimes made that Lewin’s terminology is “meta- 
phorical.” All description consists in calling attention to similarities, 
and all terms are therefore metaphorical. What we should ask of the 
Scientist proposing a new term is that he make clear the limits of gen- 
erality involved. Is psychological or “life space” like geometric space 
m every respect? Lewin says it has all the qualities ascribed to space in 
non-quantified geometry (i.e., in topology). Presumably the life space 
has some but not all the characteristics of the more-familiar Euclidean 
space. | Thus the term will for a long time have for us a strongly analogical 
coloration; it will suggest, that is, some properties which it does not have. 

The merits of such a new way of describing facts must not, however; 
be overlooked. “Psychological or life space” suggests parallels which 
aro actually confirmable hypotheses. The volume of significant re- 
search which has been set in motion by Lewin’s array of terms is a tribute 
to their provisional utility. It is the reviewer’s belief that they will 


410 


_ Book Reviews 411 


more basic question comes when we consider the terms as ex- 
ory concepts or constructs. Here fecundity in suggesting hy- 
-potheses is not an adequate criterion. Nor can we accept Allport’s 
eriterion of ‘understanding the phenomena in question.” It is rare for 
concepts to seem unworkable in the concrete situation to which their 
own author seeks to apply them. A construct must prove itself in terms 
tability in systematically varying conditions. In social psychology it 
may be years before the constructs can be tested in the requisite variety 
critical situations. 
Mt Meanwhile, we do find here provocative interpretations of current . 
‘problem situations. Part I deals chiefly with the problem of democratic 
ducation with particular reference to Germany. Part II deals with 
ionflicts in Face-to-Face Groups.” Part III, dealing chiefly with 
inority group problems, is somewhat more miscellaneous. The last 
_ chapter is significant because it reveals Lewin right up to the moment of 
_ his untimely death striving to see how, through action research, his 
_ hypotheses could be put to a genuinely experimental test. All persons 
interested in social engineering will find stimulation in this book. 

Horace B. English 
Ohio State University 


‘oder, Dale, Paterson, Donald G., et al. Local labor market research. 
Minneapolis, Minnesota: University of Minnesota Press, 1948. Pp. 
xvii, 226. $3.50. 

"Early in 1939 officials of the city of St. Paul, Minnesota became aware 
"an apparent paradox. Although employment had been restored to 
portions equal to those of the predepression period relief loads and 
enditures continued at the high levels typical of the depression years. 
Mayor's Committee on Unemployment studied the problem but found 
satisfactory explanation. Finally the committee turned to social 
ntists at the University of Minnesota for help with the problem. In 
the early 30’s the Employment Stabilization Research Institute of the 
- University had made a series of significant studies of employment and 
“Unemployment and was thus uniquely equipped in 1940 to attack the 
“immediate problem facing the city of St. Paul. The story of the research 
fforts of the ESRI during the years 1940-42 is reported in Local Labor 
Market Research. 

The significance for psychologists of this account arises in part from 
the cooperative nature of the enterprise since the research staff included 
ychologists as well as economists, sociologists, and statisticians. Much 
the methodology will interest applied psychologists in the fields of 


420 Book Reviews 


opinion polling, counseling, and personnel administration. Finally, the 
findings, particularly of Project 3 constitute important new contributions 
to personnel psychology. t F 

After a one-year pilot study it was apparent that the research program 
should include a comprehensive study of the labor marketing process. 
Five projects were selected for study. 

Project 1 appraised available employment data particularly those of 
state and federal agencies and attempted to improve these labor market 
reports as a means of providing continuing indices of employment, hours, 
wage rates, and earnings. These measures were based on employer 
reports to various public and private agencies and covered only the 
employed. 

Project 2 sought to provide detailed information on the numbers and 
types of labor supplies available and to serve as a check on the data ob- 
tained in the first project. In addition, special studies were undertaken 
to obtain information regarding priorities unemployment, civilian morale, 
nature and extent of vocational training, housing, shopping habits, trans- 
portation, and migration. The method was a continuous sampling survey 
using both a panel and randomly selected respondents in St. Paul. The 
result is an impressive demonstration of the use of sampling techniques 
in maintaining a continuing check on the dynamic elements of the labor 
market and in providing basic information on a wide range of community 
problems, 

Project 3 concerned itself with some of the frictions in the labor market 
which interfere with the matching of men, women, and jobs. Psycho- 
logical tests, interviews, and attitude surveys were among the tools used 
in studying the human factor in employment. 

An attempt was made to relate available employment data to training 
opportunities available in the community. Data on school enrollments 
and the employment experiences of post-graduate youth were collected. 
‘The findings raised the question as to how well the public school system 
had fulfilled its responsibilities for vocational training. 

Opinion polling methods were used to identify and measure attitudes 
and attitude changes among various occupational groups. Findings 
indicated that members of the labor market often held opinions at 
variance with the facts and this doubtless accounted for some of the labor 
force frictions. It was possible to get some idea of the job satisfaction of 
various occupational groups through this polling approach. Questions 
regarding public policy such as “Do you think it is too easy for people to 
get on reliefi ” got at attitudes which indirectly affect employment policies. 

Of great interest to personnel psychologists is that portion of the 
study which compared the occupational classification assigned on the 


Book Reviews 413 


intensive clinical study to unemployed job seekers with those 
ms routinely assigned by employment office interviewers. 
sults indicate that clinical study rather than a superficial appraisal 
primarily on past job experience will identify a considerable 
x of persons whose potentialities for employment otherwise go 
overed. 
This intensive clinical study of almost four hundred unemployed 
yielded other useful information. For example, it was found that 
ing letters may be useful in a large-scale counseling program where 
for interviews is at a premium. Other analyses gave information 
on the dominant causes of unemployment. 
z A follow-up study of persons tested and studied clinically ten years 
viously indicated that occupational adjustment can be predicted with 
ing accuracy. Re-tests on these same people gave amazingly 
„test correlations on pencil and paper tests being about .9. Cor- 
tions for performance tests were somewhat lower, being in the neigh- 
ood of .6 to .7. 
Project 4 was an attempt to tease out some of the complex interrela- 
ips which influence the demand for labor. Analyses of economic 
and opinion surveys were the methods used. The latter sought to 
I and classify employers’ opinions as to how and why they make 
sions to offer employment. Ina study of the printing industry the 
nployees were also polled to ascertain any divergencies. 
I roject § was an analysis of relief administration policies and practices 
on the assumption that factors other than those of the labor market might 
sponsible for the St. Paul paradox of increasing employment without 
companying decrease in relief rolls. Analyses of official reports of 
work agencies provided one source of data. A major part of the 
dy, however, was an intensive analysis of the characteristics of relief 
ipients. Finally, detailed study was made of fifteen relief clients for 
om a great deal of information was available as a result of their partici- 
in the occupational analysis work of Project 3. 
_ The findings of Project 5, as a whole, indicated that the nature and 
ditions of relief administration were an important factor in the 
tion. It seemed clear that relief expenditures reflected much more 
the current condition of local labor markets. 
Ki: Both the conduct of this research program and the nature and form 
ublication were materially affected by the war. Changes in personnel 
finally the withdrawal of foundation support because of war condi- 
brought the study to an end before it really was completed. Thus 
book is more of a progress report emphasizing methodology than it 
lefinitive statement of the findings. The compilation and publication 


414 Book Reviews 


of this report actually was undertaken by the Industrial Relations Center 
established at the University of Minnesota in 1945. It is the work of 
many authors and reflects some of the obvious limitations. Credit for 
a careful editing should go, however, to Herbert G. Heneman, Jr. 

This is a unique and important contribution to labor market research 
and is a milestone marking the road which psychology is traveling 
toward cooperative research on meaningful problems. At a time when 
“action research” has become a fashionable term among social scientists 
the reviewer judges this report to be a significant demonstration of the 
application of psychological viewpoint and methodology to pressing 
social problems. This categorization as action research, be it noted, is 
expressed at the risk of embarrassing the directors of the enterprise who 
have long engaged in the study of real problems without benefit of a more 
esoteric terminology applied to their highly productive efforts. 

Arthur H. Brayfield 

University of California 
Jucius, Michael J. Personnel management. Chicago: Richard D. Irwin, 

Inc., 1948. xii + 696 pp. $6.00. 

Personnel management is defined as “the field of management which 
has to do with planning, organizing, and controlling the performance of 
various activities concerned with procuring, developing, maintaining, and 
utilizing a labor force such that the objectives and purposes for which the 
company is established are attained as effectively and economically as 
possible, and of labor itself are served to the highest possible degree.” 

Around this definition Jucius has written a college textbook designed 
to provide a “realistic study of the principles and practices of personnel 
management.” The thirty chapters deal systematically with organiza- 
tional problems, approaches, and techniques in selecting, training, re- 
numerating, and motivating employees and in maintaining satisfactory 
labor-management relations. 

The presentation is in the typical textbook fashion. It is well- 
organized and will lend itself to outlining in the student’s notebook. 
The emphasis seems to be upon presenting a body of material to be studied 
under the guidance of a qualified instructor rather than upon providing 
a self-motivating treatise for the general reader. It differs from the 
standard textbook, however, in that supporting source materials are 
rarely given. Footnote references are infrequent and there are no sug- 
gested additional readings for separate chapters. 

‘The chief merit of the text is its well-organized and systematic presen- 
tation of a wealth of information about personnel practices and principles. 
Tt is chuck-full of step-by-step procedures, examples of forms, and 
Practical suggestions for approaching the common problems faced by a 


Book Reviews 415 


personnel department. It emphasizes the importance of “getting the 
facts” and of careful follow-up and control after the appropriate steps 
haye been taken. The final chapter stresses the need for a research point- 
 of-view which would lead to continuous intensive study of all aspects of 
__ personnel management. 

| Since this review is written by a psychologist for psychologists it is 

| pertinent to look for evidence of the impact of psychological findings upon 

= personnel practices as described, even though the author is not writing a 

= text on personnel psychology. In this respect the presentation is rather 

weak, Recognition is made of individual differences and of the im- 

~ portance of employee attitudes and feelings and, rather frequently, some 

rather cogent observations on human nature are reflected in common 

sense statements. There is little overt recognition, however, of the 

dynamic nature of interpersonal relationships, of the fundamental prob- 
Jem of democracy in industry, of the individual as a person rather than as | 
= an employee. The areas in which psychology has made specific contri- 
butions in industry are the most poorly presented, viz., interviewing, 
counseling, and testing. The influence of the social structure in company 
_ organization is not described, the Hawthorne studies being referred to 
_ merely as an example of research. 

y In summary, “Personnel Management” will serve as an excellent 
a textbook in the field of business administration if supplemented by 
source materials, if livened up by a stimulating instructor, and if the 


students also take courses in personnel and industrial psychology. 
Albert 8. Thompson 


- Teachers College, 
Columbia University 

Lall, Sohan. Mental measurement. Allahabad: Allahabad Law Journal 

Press, 1948. Pp 88. 

This little book presents results obtained from the administration of 
i. three tests to approximately 2000 Indian children in 58 government high 
e schools. The children were 11+ years old and the tests were a group 

verbal intelligence test, an English language test, and an arithmetic test. 
No data are given on the construction of any of the tests; the author 
states simply that they were patterned after the Moray House tests of 
Godfrey Thomson. The tests in English and arithmetic were achieve- 
Ment examinations in these areas. 
‘ Distributions of test scores in the entire sample are presented and 
_ the method used for removing the skewness which appeared in all three 
distributions is defended. Perhaps the most interesting part of the 
monograph is the presentation of comparative test scores for four Indian 
= Castes, for children from different geographical regions, and for children 


Te 


416 Book Reviews 


whose parents fell into various occupational groupings. Most of the 
differences are quite small but some are statistically significant. It seems 
likely, however, that the apparent significance was, in many cases, a 
resultant of the small and probably unrepresentative samples. How 
representative the samples were we have no way of judging. 

On the whole the monograph is somewhat amateurish and reminds 
one of publications in this country of some 25 years ago. ‘This is a 
pioneer job, however, done under considerable difficulties, and the author 
deserves a great deal of credit. The references in the book are to Thom- 
son, Spearman, and Burt, under whom the author apparently had his 
training. 

Henry E. Garrett 

Department of Psychology 

Columbia University 
Evans, Ralph M. An introduction to color. New York: John Wiley and 

Sons, 1948. Pp. x +340. $6.00. 

Any serious treatise on color is a major undertaking which necessitates 
the coordination of materials from physics, physiology, and psychology. 
This book was written with the avowed purpose of giving adequate treat- 
ment to materials from each of these three fields. Each phase is treated 
Separately and then the three are interwoven near the end of the book. 
Consistent, understandable terminology is achieved by employing com- 
mon speech meanings of words, with a minimum number of new words 
introduced and defined. Many pictures and graphs are employed to 
help the reader grasp the fundamentals. To a large degree the text is 
descriptive and non-mathematical. Although it is not assumed that the 
‘reader has more than an elementary knowledge of physics and psychology, 
no simplifying omissions of subject matter are made. The author, head 
of the Color Control Department of the Eastman Kodak Company, is 
attempting to give the reader the benefit of his twenty years practical 
experience in the field. 

; In this book, the author has been fairly successful in achieving his 
aims. The material is not popular treatise, but a simplified technical 
discussion of highly complex and technical subject matter. Although 
not easy reading, persistent study of the material will be found rewarding. 
It is the only book known to the reviewer that attempts to give such a 
complete story of color. There is somewhat more emphasis upon physio- 
logical and physical than upon psychological aspects. Nevertheless, the 
psychologist will profit greatly by reading the book. Especially he will 
be able to correct many inaccurate notions obtained from elementary 
discussions. 


One wonders why a discussion of geometric optical illusions are in- 


Book Reviews eu ALT 
treatise on color. Furthermore, to include the ambiguous 
an illusion is erroneous. The book would be more complete 
h discussion of color experiences of partially (red-green) color 
sons were included. Another item that would improve the 

more complete discussion of color in illumination, and lighting 
to color in interior decoration. s 
of the more important sections deal with the use of colors in 
phy, art and display situations. In general, this book is well or- 
and clearly written. It will be useful both to those interested in 
amental principles of color and to those working with color 


s in practical situations. 
Miles A. Tinker 


of Minnesota 


New Books, Monographs, and Pamphlets 
Books, monographs, and pamphlets for listing and possible review should be sent to 


Donald G. Paterson, Editor, Department of Psychology, 
University of Minnesota, Minneapolis 14, Minnesota 


Guiding human misfits. Alexandra Adler. New York: Philosophical 
Library, 1948. Pp. 114. $2.75. 

The psychology of development and personal adjustment. John E. Ander- 
son. New York: Henry Holt and Co., 1949. Pp. 720. $3.25. 

Fatigue and impairment in man. S. Howard Bartley and Eloise Chute. 
New York: McGraw-Hill Book Co., Inc., 1949. Pp. 429. $5.50. 

Psychological factors in education. Henry Beaumont and Freeman G. 
Macomber. New York: McGraw-Hill Book Co., Inc., 1949. Pp. 
318. $3.00. 

Psychology of personnel in business and industry. Roger M. Bellows. 
New York: Prentice-Hall, Inc., 1949. Pp. 499. $4.50. 

A summary of clerical tests. George K. Bennett and Ruth M. Cruik- 
shank. New York: The Psychological Corporation, 1949. Pp. 122. 
$1.25. 

Encyclopedia of criminology. Vernon C: Branham and Samuel B. Kutash, 
Editors. New York: Philosophical Library, 1949. Pp. 527. $12.00. 

Psychological tests for retail store personnel. Dora F. Capwell. Pitts- 
burgh: Research Bureau for Retail Training, University of Pittsburgh, 
1949. Pp. 48. $1.00. 

Reading manual and workbook. Homer L. J. Carter and Dorothy J. 
McGinnis. New York: Prentice-Hall, Inc., 1949. Pp. 120. $1.75. 

The psychology of social classes. Richard Centers. Princeton: Princeton 
University Press, 1949. Pp. 256. $3.50. 

Applied experimental psychology, the psychology of engineering design. 
Alphonse Chapanis, Wendell R. Garner, and Clifford T. Morgan. 
New York: John Wiley and Sons, Inc., 1949. Pp. 402. $4.50. 

Introduction to the Szondi Test. Susan Deri. New York: Grune and 
Stratton, 1949. Pp. 354, $5.00. 

Practical lessons in psychiatry, Joseph L. Fetterman. Springfield: 
Charles C. Thomas, Publisher, 1949. Pp. 342. $5.75. $ 

The art of readable writing. Rudolf Flesch. New York: Harper and 
Brothers, 1949. $3.00. 


Adolescence. C. M. Fleming. New York: International Universities 
Press, Inc., 1949. Pp. 261. $4.50. 


418 


New Books, Monographs, and Pamphlets 419 


e energetics of human behavior. G. L. Freeman. Ithaca: Cornell 
niversity Press, 1949. Pp. 350. $3.50. 
book manual for marriage and the family. Revised edition. John 
ey Furbay. New York: Appleton-Century-Crofts,-Inc., 1949. 
248. $2.00. 
can social reform movements: their pattern since 1865. Thomas H. 
Greer. New York: Prentice-Hall, Inc., 1948. Pp. 313. $4.00. 
niown’s youth. A. B. Hollingshead. New York: John Wiley and 
Sons, Inc., 1949. Pp. 480. $5.00. 
scent development. Elizabeth B. Hurlock. New York: McGraw- 
ill Book Co., Inc., 1949. Pp. 566. $4.50. 
ing to drive safely. A. R. Lauer. Minneapolis: Burgess Publishing 
Co., 1949. Pp. 145. $2.25. 
ommunications research 1948-1949. Paul F. Lazarsfeld and Frank 
Stanton. New York: Harper and Brothers, 1949. Pp. 332. $4.50. 
people and the church. Paul B. Maves and J. Lennart Cedarleaf. 
Nashville: Abingdon-Cokesbury Press, 1949. Pp. 272. $2.50. 
he effect of experience on nursing achievement. R. Louise McManus. 
_ New York: Bureau of Publications, Teachers College, Columbia 
University, 1949. Pp. 64. $2.10. 
iatry: its evolution and present status. William C. Menninger. 
‘Ithaca: Cornell University Press, 1949. Pp. 149. $2.00. 
fics, medicine, and man. H. J. Muller, C. C. Little, and Laurence 
Snyder. Ithaca: Cornell Univeristy Press, 1949. Pp. 164. $2.25. 
introduction to clinical psychology. L. A. Pennington and I. A. Berg, 
Editors. New York: The Ronald Press Co., 1949. Pp. 600. $5.00. 
lication through art. Herbert Read. New York: Pantheon Books 
c., 1949. Pp. 320. $5.50. 
Psychodiagnosis. Saul Rosenzweig. New York: Grune and Stratton, 
1949. Pp. 380. $5.00. 
The clinical application of psychological tests. Roy Schafer. New York: 
_ International Universities Press, Inc., 1948. Pp. 346. $6.75. 
oblems of early infancy. Milton J. E. Senn, Editor. Second Con- 
ference of Josiah Macy, Jr. Foundation. New York: Josiah Macy, 
. Foundation, 1948. Pp. 120. $1.00. 
lividual behavior. Donald Snygg and Arthur W. Combs. New York: 
arper and Brothers, 1949. Pp. 386. $3.50. 
ing theory in school situations. Esther J. Swenson, G. Lester Ander- 
son and Chalmers L. Stacey. Minneapolis: University of Minnesota 
= Press, 1949. Pp. 103. $1.50. 
matic apperception test. Charles E. Thompson. Cambridge: Har- 
Univeristy Press, 1949. Manual $.50, Test $5.00. 


420 New Books, Monographs, and Pamphlets 


Man’s quest for significance. Lewis Way. New York: The Macmillan 
Co., 1949. Pp. 211. $3.50. 

The inner world of man. Frances G. Wickes. New York: Henry Holt 
and Co., 1949. Pp. 320. $5.00. 

Trends in student personnel work. E. G. Williamson, Editor. Minne- 
apolis: University of Minnesota Press, 1949. Pp. 417. $5.00. 

Jobs and the man: a guide in understanding and dealing with workers. 
Luther E. Woodward and Thomas A. C. Rennie. Springfield: 
Charles C. Thomas, Publisher, 1946. Pp. 125. $2.00. 

Occupational outlook handbook. Bureau of Labor Statistics, Bulletin 1949, 
No. 940. Washington, D. C.: Superintendent of Documents, U. 8. 
Government Printing Office, 1949. $1.75. 

Guidance handbook for elementary schools. Office of Los Angeles County 
Superintendent of Schools. Hollywood: California Test Bureau, 
1948. Pp. 158, $2.40. 

Guidance handbook for secondary schools. Office of Los Angeles County 
Superintendent of Schools. Hollywood: California Test Bureau, 
1948. Pp. 246. $3.00. 


rnal of Applied Psychology 


An Objective Analysis of Morale 


William James Giese 
William James Giese, Ph.D. and Associates, Chicago 3, Illinois 


and 


H. W. Ruter 
Aldens, Inc., Chicago 7, Illinois 


profit and loss statement of a business is affected by that elusive 
called morale. Since most successful executives have recognized 
many companies have attempted to get a workable measure of the 
is of morale among their employees. The most successful of these 
npts has been the morale survey through the use of a correctly de- 
ned questionnaire to measure the attitudes of the employees toward 
Supervision, working conditions, wage rates, chances for advancement, 
‘similar important attitude areas. Although the results of such 
e surveys are usually interesting to management (sometimes the 
ults are even startling), the questionnaire method is cumbersome, 
ostly, and slow. Also, because of its nature, the morale survey can be 
n successfully only at intervals of about once a year. This limitation, 
dition to the costliness, often prevents the detection of an undesirable 
in morale when corrective action is easiest and most effective. 
| addition to these very serious limitations, the morale survey be- 
a row of question marks when a cost and savings analysis of it is 
_ The only answer to the question of costs is a general agreement 
executives that poor morale costs the company money. But just 
ch money has not been determined, for there are no accounts in 
oor morale shows up as an identifiable and cost accountable loss. 
business moves into a more competitive era this question of costs and 
38 presents itself with an ever increasing urgency. If increasing 
Morale of the employees costs money, how much will be saved for 
dollar spent? Is there a straight line relationship; that is, are the 
8 per dollar spent the same regardless of the amount spent? Where 

ak-even point? At what point does the law of diminishing 
S begin to operate? , 

421 


422 William James Giese and H. W. Ruter 


After considering the above facts and questions along with their 
implications, the personnel manager * began to build an index based on 
objective records. After a number of conferences we organized a pro- 
gram of research that would give us some of the answers and tie the 
results directly to employee morale. 


Purpose 


Our primary purpose was to analyze the relationship of the objective 
records of departmental performances to morale as measured by the 
questionnaire method. Once these relationships are known, it is merely 
a mathematical problem to determine the best relative mathematical 
weights for each of the factors for the purpose of predicting morale. 


In order to make certain that our basic records held promise of i 


providing fundamental data for the prediction of morale, we first ex- 
amined these factors as well as the morale score itself. Since the morale 
questionnaires were not scored but had only percentages of responses for 
each part of an item, it was necessary to make up a scoring system for the 
morale questionnaire.” 

The first step was to learn if on the basis of our scoring there was an 
adequate range of difference between the departments in morale. Similar 
analyses were made for six factors * on which objective data were avail- 
able in the personnel department and which appeared feasible for study to 
determine their prediction value in indicating departmental standings. 

Our final purpose was to set up a simple method for the determina- 
tion of departmental morale based upon factors each of which are cost 
accountable. 


„`H. W. Ruter, Personnel Manager of Aldens, Inc., 511 S. Paulina Street, Chicago 7, 
Illinois. Aldens is a national mail order company which retails by mail and through 
Separtmant stores all types of clothing, housewares, furniture, etc. 

hk The morale questionnaire was scored by giving the most unfavorable response to 
an item a value of 1, the second most unfavorable response a value of 2, and so on uP 
to 6 for the most favorable response. Seventeen of the eighteen items in the question- 
naire were amenable to this scoring system. The morale score was merely the total 
points for the seventeen items. 

* The factors considered were: (a) Seasonal accumulated Departmental Production 
Efficiency; (b) Seasonal accumulated Departmental Error Efficiency, covering errors 
ene customers, i.e. errors in company handling which may delay the completion 
S sale but are not otherwise noticeable to the customer; (c) Seasonal accumulated 

epartmental Error Efficiency, covering errors affecting customer such as items charged 
and omitted, wrong merchandise, size, color, ete. This type of error is most costly 
Lae ate e ine customer relations, merchandise loss, etc.; (d) Annu 

AAO) accum! diness percentage; nal 
accumulated cere percentage. ‘pS aay a a0 pem 


An Objective Analysis of Morale 423 
The Results 


morale score gave an adequate separation between departments. 
ure 1, each square represents a department, and the number below 
uare is the average (arithmetical mean) morale score of the em- 
sin the department. 


+ sagt Oh 
1 
I 
I 


O 


50 5 460465 420475 48, 485499495 509505 519 515529525539 535 509545550 555 560565570575 589585 590 
Morale Score 
Fig. 1. Distribution of departmental morale scores. 


In productive efficiency the mean of the departments varied from 
80% to as high as 113%. This means that there are large enough 
differences between departments in their productive efficiency averages 
use this factor in the correlation analysis. Figure 2 illustrates depart- 


1 an differences in average productive efficiency. 


| 
By coe eee m 


or the remaining 5 factors the departmental differences are sum- 
rized in Table 1. 
Table 1 


sans and Standard Deviations of the Departmental Means for the Six Objective 
Factors used in the Analysis of Morale 


Factor Mean 8.D. 

Per Cent Production Efficiency 95.64 7.43 
t er Cent Error Efficiency Not Affecting Customers 78.41 20.85 
er Cent Error Efficiency Affecting Customers 61.60 17.00 

Per Cent Turnover 222.70 80.02 
Per Cent Late 9.67 2.87 


424 William James Giese and H. W. Ruter 


- The present error efficiency NOT affecting customers has a cluster 
of departments averaging between 84% and 94% (14 out of 22) but due 
to the fact that the entire range is from 20% to 100% we kept this factor 
for correlational purposes. 

The per cent error efficiency affecting customers shows a wide range 

- of departmental performances and an even spread of differences through- 
out the range. Such a condition makes this factor a promising partial 
for the prediction of morale. Also, when the cost analysis is made of 
such errors, the potential savings due to reduction of these errors will in 
all probability be many times those possible with errors NOT affecting 
customers. The reason for this is that errors affecting customers are 
not only more costly but that there are only a few departments with a 
high efficiency standing. 

The highest departmental labor turnover was 350% and the lowest 
40%, almost a 9 to 1 ratio. The costliness of turnover can easily amount 
to a six or seven figure annual loss since the minimum loss per employee 
termination is $100.00. In addition, this factor for the purpose of pre- 
dicting the morale standing of a department should be of great importance 
because of the large and even spread of departmental standings. 

There are wide departmental differences in the per cent of the em- 
ployees who are late. Here the range is from 3.5% to 14.5%; a ratio of 
well over 3 to 1. This factor also has an adequate spread and evenness 
throughout the spread for correlation analysis. 

Since a department must carry more personnel to meet the work load 
demands if absenteeism is high, departmental variations in per cent 
absent were analyzed. The lowest was 314% and the highest was 
11%; a ratio of almost 3 to 1. The department which averaged 11% 
absent for the year had to carry at least 11% more employees to meet 
adequately the work demands placed upon it. 

Table 2 shows the results of correlating each of the six factors with the 
morale score and with each other. 

. The two factors with the highest relationship to morale are per cent 
turnover and per cent absent. Both relate to morale with a fair negative 
had When morale is low, absenteeism and turnover tend to be 

Error efficiency affecting customers has a higher relationship with 

. morale than does error efficiency NOT affecting customers. Most in- 
teresting is the low negative relationship between these two factors. That 
is, those departments with a high efficiency in errors affecting customers 
have a slight tendency to be low in errors NOT affecting customers. 
This difference could be accounted for on the basis of departmental em- 
phasis on the importance between the two types of errors. There is & 


An Objective Analysis of Morale 425 


ative relationship between morale and per cent late in a depart- 
“Since outside factors such as number of transfers made on public 
ation, distance from place of work, weather, etc. probably have 
no relationship with morale but a fairly high one with lateness, 
relationship between morale and per cent late is to be expected. 
slight relationship in the positive direction between morale and 
on efficiency, but it is much less important to morale than per 
nover or per cent absent. This means that high morale is only 


Table 2 


n Product-Moment Coefficients of Correlation between Objective Records of 
Departmental Performance and the Morale Score * 


i, 3 
r Šo Aa A : 
He g oioi 
H à) . 55 5 
3 sh o 38 å X 8 
: S a a 3 ~ 3 
+.19 
Affecting Customers +.15 —.50 
Error Efficiency 
+27 +387 -A 
—42 —18 +05 —.25 
-20 —18 +30 —.28 +.33 
-AT  =15 6-18-07 ~- —.15 = .08 


e above correlations were computed from the averages of 25 departments. 
e the scattergram was composed of 25 points. The total number of employees 
d by the 25 points was 3000. The number of employees in the departments 
from 14 to 405 with a mean of 120 and a sigma of 90.5. The correlation between 
e score and the number of employees in the department was —.07. 


y related to per cent productive efficiency which makes it is entirely 
for a department to have a high standing from the standpoint of 
ot unit costs, but it could have higher absenteeism, and turnover, 
a tendency to greater errors. In such a department this initial 
efficiency should be readjusted (and it would be downward) 
of the additional costs incurred due to the turnover, absenteeism, 

ion of errors. The costs of turnover and absenteeism usually 
idden in the various burden, administration, indirect, and similar 


426 William James Giese and H. W. Ruter 


accounts. These costs do exist, and some departments waste much more 
money per employee than do others. Often a department can reduce its 
total unit costs more through concerted effort on these indirect costs 
than it can through merely increased output. However, such costs are 
usually spread on a per employee or per dollar of direct payroll basis so 
that the more efficient departments from the standpoint of these indirect 
charges have to carry the load of the more poorly run departments. 
Since such factors as turnover and absenteeism relate to both costs and 
morale, it should pay top management to reward those in charge of the 
departments who are able to hold these factors to a practical minimum. 

Since the correlations between the six factors and morale tend to 
be somewhat higher than the correlations between all of the various 
factors, it paid to compute a multiple correlation. The multiple cor- 
relation was .71 which is high enough to warrant the use of these six 
factors for the construction of an objective morale index. Table 3 lists 
the Beta weights for each factor. 


` Table 3 
Factor Beta Weight 
1. Per Cent Productive Efficiency .0630 
2. Per Cent Error Efficiency NOT Affecting Customers 1674 
3. Per Cent Error Efficiency Affecting Customers 1227 
4, Per Cent Turnover 4348 
5. Per Cent Late 0758 
6. Per Cent Absent 4894 


Ro.rsase--706 — Tt.uvwzys=.706 


In order to check our calculations we multiplied each factor for each 
department by the Beta weight given in Table 3 and then added the 
six products for each department to arrive at an objective morale index. 
We correlated this objective morale index with the morale score and 
obtained .706 which is shown after re.uvwxys in Table 3. 


Recommendations 


Since this study has determined that there is an important relationship 
between morale and the combined factors of production efficiency, error 
efficiency, labor turnover, tardiness and absence it is recommended that 
the index of these factors be used as a determinant of the relative levels of 
morale in various operating departments. The Objective Morale Index 
is also a measure of supervisory effectiveness and may be used to supple- 


hose departments falling into the lowest morale classifications should 
ly scrutinized to determine whether or not the problem is 
tment-wide or if it is limited to certain activities or working units. 
amination of the index factors of each group will determine this. 
| low morale has been localized as much as possible a diagnostic 
ation should be conducted to determine causes. Corrective 
n is then to be applied where it is needed. 

nother questionnaire survey should be undertaken to secure data for 
dation of the Objective Morale Index. Thereafter the question- 
survey need be used only at longer intervals and not so much for 
Measurement as for obtaining information for the diagnostic 
ations. 


Summary 


purpose of this study was to predict the morale of departments 
objective data. A morale questionnaire was scored so that a quanti- 
score was available. The six objective factors of per cent pro- 
ive efficiency, per cent error efficiency NOT affecting customers, per 
t errors efficiency affecting customers, per cent turnover, per cent late, 
cent absent were intercorrelated and correlated with the depart- 
morale score. The multiple correlation between these 6 factors 
morale was .71. 
Since the six factors are objective and in most cases cost accountable 
use for morale measurement is not only practical, but meaningful 
rating executives in business and industry. 
The relative weightings for the six factors for the prediction of morale 
‘probably specific to this particular business. Therefore, a similar 
s must be made before this method of morale measurement can be 
d to other businesses or industries. 
_ Since the economic milieu may have some effect on the relative weights 
“the factors, a study is now in progress to revalidate the Objective 
e Index. 


Implementing an Employee Opinion Survey 


Major Fred E. Holdrege, Jr., USAF 
Air Materiel Command, Dayton, Ohio 


The problem of securing whole-hearted support from supervisors and 
employees is basic to the proper conduct and “follow-through” of an 
employee opinion survey. It is believed that many readers of the 
Journal of Applied Psychology will be interested in the information used 
to implement the Air Materiel Command’s 1948 survey of the opinions 
of approximately 80,000 civil service workers employed at its eight major 
bases located in various parts of the country. 

The survey itself was listed as a “Survey of Employment Conditions” 
in a covering letter which accompanied the opinion questionnaire itself. 
The letter is as follows: 


Subject: Survey of Employment Conditions 
To: Civilian Employees of the Air Materiel Command 

1. To provide an opportunity for you to express your sentiments regarding personnel 
policies and practices which affect yourself and your employment with the Air Materiel 
Command, I have directed the Chief, Personnel and Administration Department, this 
Headquarters, to conduct a survey by means of the inclosed questionnaire. 

2. It is my hope that this survey will reveal any existing difficulties and discrepancies 
which will be corrected for the benefit of all employees by the proper application of 
personnel management techniques. 

8. You can help to make this survey a success by basing your reply to each question 
on an honest estimate of your views as to the manner in which the AMC is operated, 
considering, of course, the restrictions imposed upon us by Department of the Air Force, 
Civil Service directives, and other legal restrictions. 

J. T. McNarney 
General, USAF 
Commanding 


Instructions for filling out the questionnaire as printed on the opinion 
questionnaire were as follows: 


How I Feel About My Job 
(A Survey of Civilian Employee Attitudes) 


Hon are asked to answer this questionnaire as a part of a survey of civilian 
PEH tig opinions and attitudes which is being made by the Personnel 
ysis Office, Headquarters, Air Materiel Command. The purpose of this 
mahiti is to find out how our employees feel about their work i the conditions 
under which they Work, so that sound plans for improvement can be made. 
This is not a "There are no right or wrong answers. Just answer 


the questions in the way you, yourself, feel about them. That is the only 
correct answer. 
428 


Implementing an Employee Opinion Survey 429 


Instructions 


d each question or statement carefully to make sure you understand it 
‘king your answer. If you have any questions raise your hand. 

'k only one answer to each question. If you have more to say, write 

e last page of this questionnaire, but first mark one of the suggested 

which most clearly expresses your opinion. 

fore turning in your questionnaire, check to make sure you have answered 


he questionnaire itself was 24 pages long containing a total of 146 
calling for multiple choice checking of answers and open-end answers 
as a page for making further remarks if desired. The contents 
‘be classified as follows: Part 1, My Job, 25 questions; Part IT, 
History, 8 questions; Part III, Job Relations, 29 questions; 
, Supervision, 29 questions; Part V, Training, 10 questions; Part 


was prepared for each installation, copies being sent to the Com- 
ng officers of each installation. Each report was accompanied by 
ord designed to secure maximum attention and maximum action 
the findings. The Foreword is as follows: 


1. Purpose: This final report has been prepared to facilitate ready com- 
arison of the findings at your installation with that of the Command ay cs 
e results of the employee opinion survey conducted at major AMC 
allations, March-April 1948. 


Tn the final analysis, the value to be derived from this opinion survey 
ds upon whether management is Rorepolegically prepared to translate 
ndings of the survey into action. ithout action, the survey is merely 
ip of paper. With action, the survey becomes an effective means to an 
. . that of improved Command operational teamwork and economy, 

has been found that top management is too often inclined to take 
only on incidental or secondary findings of such surveys . . . ignoring 
re important revelations. In some instances, top management is merely 
t to learn that the morale of its organization is no worse than that of 


Naturally, there is a defensive reaction on the part of operating officials 
e disclosures brought about through morale or opinion surveys due to 
pretation of these disclosures as a reflection upon their ability. Hence, 
8 a tendency toward unwarranted discounting of survey data and a 
of the mind to self-analysis. $ 

An objective viewpoint rather than a justification attitude must be taken 
ults anticipated are to be gained. 


430 Fred E. Holdrege, Jr. 


3. Employee Opinion: 


a. In reviewing the graphs and tabulations in this booklet, it is extremely 
important to bear in mind that these opinions of your employees are based on 
either (1) factual conditions, or (2) upon erroneous impressions. Whichever is 
the case, the only satisfactory solution lies in prompt remedial action. 

b. It is also realized that adverse comment by employees with respect to a 
particular activity is not necessarily indicative of an activity poorly adminis- 
tered. It may be that from management’s standpoint the activity is an efficient 
one. Whichever is the case, is of minor consequence; adverse employee opinion 
on any particular activity is a danger signal that management cannot afford to 
ignore. It is a sympton that calls for diagnosis and treatment. 

c. If adverse ppmons are based on factual conditions, then corrective action 
with respect to these conditions seems the only logical solution. If, on the 
other hand, unfavorable opinions expressed by employees are based upon 
erroneous impressions or information, then action should be taken to bring the 
true facts to the attention of employees so that they may be more thoroughly 
and accurately informed. 


4. Comparative Charts: A Command comparison of the nature presented 
herein naturally places emphasis upon deviations from the average of the 
Command. These deviations are important, of course, but other aspects of 
the graphs must not be entirely disregarded. For example, matching or 
equaling the Command average is not necessarily indicative of a satisfactory 
condition for the reason that the Command average may reflect a condition 
unfavorable throughout the entire Command. Then too, it should be pointed 
out that the Command average is a composite picture of all installations and 
that a proportion of the installations must necessarily fall above the Command 
average. 

5. Chart Interpretation: In order to assist you in interpreting the charts, 
recommendations have been made on the survey questions which show & 
significant variation from the Command average. While such recommenda- 
tions atic cannot be as accurate as on-the-spot observations, they are & 
composite of the thinking of various AMC Headquarters’ organizations coupled 
with observations derived from the questionnaires themselves. 


6. Assistance: 


a. In order to assist those installations which fell below the average of the 
ommand on various questions, names of the installations which presented the 
most favorable picture have been set forth in the final section of this report. 
Installations below the Command average are urged to contact these “high” 
nstallations for suggestions as to possible means of improvement. Liaison 
with these installations, leading the Serimand in methods and procedures used, 
a Rai Tp An exchange of ideas is certainly one source of self improvement. 
en If assistance of the Hq Personnel Analysis Office is desired with respect 
any morale or employee attitude problem, a brief statement of the problem 


should be forwarded to this H i iti ty 
an on-the-spot analysis will be ee o> Shea emia 


7. Data: 


a. It should be remembered that the data secured in the conduct of this 
survey represent the first tangible information available to management as to 


what employees think of Command Management and operations. Manage- 


ment can estimate, have a fairly good idea, or may be quite certain it knows 


retty well how employees thi i his 
pre ie iA fees yees think . . . but up until the time of the survey t 


Implementing an Employee Opinion Survey 431 


b Management’s opinion of Command operations, on the other hand, is 
well-informed through the media of Inspector General reports, liaison 

Command officials, Comptroller reports, ete. You might ask, “Just 
important is it for management to have a thorough understanding of 
oyee opinion?” Actually, having this one understanding is an absolute 
quisite to successful management. Management cannot afford to forget 

‘Management is the development of people and not the direction of 
igs.” Every policy that is written, every plan that is developed, every 
ion that is made, and every activity that is initiated must be considered 


This seems an appropriate place to call attention to the one fellow who 
T e overlooked in follow-up of the questionnaire . . . and that is the 
ployee who volunteered the information. He has a right to expect manage- 
ment to inform him of any action taken on the information he supplied. 
Correspondence reaching this Headquarters indicates that there is still much 
to be desired from an employee standpoint as to action being taken. It is 
lized that some adverse situations cannot be solved over night, nor even in 
mths, yet unless the individual employee is acquainted with steps being 
m or under consideration, he is likely to jump at the conclusion that his 
nion is not even being considered. The failure of management to “keep 
ith” will nullify any further attempts to gain the cooperation of employees. 
. It-cannot be over emphasized that employees must be informed of what 
ing done. They fulfilled their part of the contract by giving their honest 
‘Opinion and suggestions for the improvement of working conditions. In 
neral McNarney’s letter to each employee, it was pointed out that adverse 
situations revealed by the survey would be corrected for the benefit of all 
ployees. Thus, information on questionnaire follow-up action must be 
passed on to employees if the management-employee relationship is to remain 
‘unimpaired. Various media can be used to inform employees, such as notices, 
icles in the installation newspaper, supervisors’ meetings, etc. 


9. Action Taken: This Headquarters would also like to know of action 
aken on survey results. Communications should be addressed to the atten- 
ion of the Personnel Analysis Office (MCAA). An analysis of follow-up 
ction will be made and those steps which appear applicable to other field 
allations will be disseminated for the information and assistance of all. 
10. Results to be Expected:! When results of the morale survey are properly 
Accepted and acted upon, the following benefits may be expected: 


= a. The possibility of a permanently higher level of employee morale and 
i uctivity brought about by concrete change in conditions, practices, and 
ay SRA i rah . 
~ b. The improved morale brought about by the very administration of the 
vey itself. Most surveys bring about immediate morale gains based upon 
employees’ discovery that management is really concerned about their 
ings. en changes are initiated as a result of employees’ opinion and it 
lade known to them that their feelings have caused the change, they will 
‘that they have some part in the determination of management policies. 
expected changes are not forthcoming or long delayed, the sre bad boost 

norale is soon lost as a result of employee disillusionment and frustration. 
The opportunity to increase the understanding of management employee 


faking the Most of Morale Surveys by F. F. Bradshaw and Herbert E. Krugman 
dson, Bellows, Henry and Company, Inc. 


~ Pred B. Holdrege, Jr. 


Madying probie roel By tba IVE 


Aa a fenals of an initial survey, employees will be more 

voluble in their replies to questions on which they are asked to comment in 

euverss hence, more effective results can be obtained from these quos- 
w 


employees are naked to comment. 
e Pola prim n A prer wm the organization to a permanent goal of ever- 
rising levels ean be measured through the means of annual morale 


to the Foreword each report also contained two sample 
with a suggestion as to how they might best be used. 
‘The suggestion for the preparation and use of the follow-up letters is 


Sample Letters of Employee Opinion Follow-Up 
of one means of following-up on the opinion survey, the following 
contain two suggested survey action letters which may be easily adapted to the 
situation by Commanding Generals of the various Air Materiel Areas. 
One is a letter to all employees acquainting them with remedial action taken on 
 atlverse survey disclosures; the other a letter to all supervisors pointing out supervisory 


| 


you individually to obtain your 

i conditions under which you work oach day. 

‘this is not pomible, I welcomed the opportunity to have the AMC employee 

at this Headquarters, This appraisal of all phases of your 

$ fe of particular interest to me as an effective means for improving working condi- 
TESES ha ene sad vork sinplea ton. 

wiew of the time involved in determining the practicability of making suggested 

Se rat ead working out of neosmary resdjustments many of the 

we cannot be put into effect immediately. How- 


ha Ned been taken as a result of your suggestions: 
1, Rahast fana have been installed in the aircraft 
` oho yh fojo aririh aga tho instrument 


ha 
eee ad tesla ba the repair shops have been rearranged to climinate 


1 damire to expres my of the many opinions and comments submitted. 
Bib ay bape Chat any situations will be corrected to the greatest extent 


Implementing an Employce Opinion Survey 433 


Our combined efforts to improve working conditions and promote good rela- 
provide all of us with a still better place in which to work. 


o second sample letter intended for supervisors is as follows: 


Headquarters 
Air Materiel Area 


analysis of the AMC employee opinion survey encompassing every 
‘our work activity, has afforded me a wealth of information that I have aeoopted 
£ of our “state of health.” Wo, as management, must assume 
ibility for the unfavorable conditions in our personnel relations as well as the 
aspocta presented by our people, However, you are management to the 
oyge and the manner in which you execute the duties and responsibilities necessarily 
ad to your position influences the employee's attitude toward his employment 


supervising 
and “think through” the elements of bis job and then analyse his 
. As a result of such analysis you may deside that 


Ta order to background for this program, I have selected for brief dis- 
. E marc story raais that have buen neglected to varying degrees 


an employee the credit he deserves for doing a good job or making a good 
Tequires very little time and effort but has a tremendous effect on groupe 
ou have nothing to lose in giving eredit or recognition for it is merely 


434 Fred E. Holdrege, Jr. 


passing on to another that which rightfully belongs to him. The practice of giving 
credit where and when it is due is one of the most impressive “tools” of supervision for 
the craving to be appreciated and recognized as an individual of worth is a universal 
human trait. 

8. We are by nature creatures of habit. It is a natural tendency to resist changes 
or to discredit the changes that another has proposed. You must reach beyond this 
narrow view and welcome all suggestions intended to improve or streamline operation. 
Give each suggestion serious consideration; if it is not advisable to make the change, 
explain your reasons (which must be sound) to the employee. If the suggestion has 
merit and can be incorporated in the operation, give the employee credit for proposing it. 
The attitude you assume in the initial discussion of the suggestion and the manner in 
which you receive ideas will either make employees feel free to bring other ideas to you 
or discourage them from ever approaching you again. We must encourage constructive 
thinking. It was evidenced in the Survey analysis that employees want to be happy in 
their work—it is our responsibility to provide the opportunity. 

4, It is my contention that at least half of the dissatisfaction and complaints can be 
attributed to erroneous information or to the lack of dissemination of information to 
employees. A good portion of the blame for this falls on our shoulders for many of the 
criticisms received concerned items which a good supervisor could answer. As an 
example, a surprising number of employees do not have a clear concept of the difference 
in the work performed by the Civilian Classification Branch and the Civilian Utilization 
Branch, These are two distinct personnel services and every employee at some time 
or another is influenced by decisions made by these Branches. This type of information 
should be common knowledge among employees. A supervisor who shares information 
pane a good group spirit and gives the individual an assurance of some prestige in 

group. 

Tt is you, the key men and women on the management team, who have been en- 
trusted with the grave responsibility of effectively applying the principles of supervision. 
Clear your minds and thoughts of prejudices or selfish ambitions; be eager to accept new 
ideas and sound procedures. The degree of your cooperation will determine the success 
of our personnel program, 


z 


Commanding General 


Information obtained from the survey has proven its value even 
beyond that originally anticipated. As a result of the questionnaire, 
various phases of the over-all “working conditions” picture have been 
subjected to re-evaluation in light of employee opinion. 

Many improvements have been made. Of course, there are a number 
of situations which will require time and continuous remedial effort 
before some degree of satisfaction can be assumed. On those problems 
extremely complex, such as the proper utilization of skills and abilities, 
Progress can be achieved only by constant surveillance and study. Then 
too, we must be extremely careful to insure the maintenance of standards 
on those practices most favorably rated by employees. 

) Plans are now going ahead for the conduct of the second command- 
wide survey. With the results of the first survey to serve as a basis for 


Implementing an Employee Opinion Survey 435 


‘it will be possible for the first time to measure improvement 
olicies and practices from an employee standpoint. 
tempt at a “one-shot” approach to achieve a “‘once-and-for-all”” 
is being carefully avoided. Instead, the picture of the Com- 
operations as presented by employees will serve as a guide and 
for the checking and development of a personnel program designed 
e maximum accomplishment through our human resources. + 


February 14, 1949. 


A Trade Test for Power Sewing Machine Operators 


Edward Glanz 
Teachers College, Columbia University 


The David Clark Company Ince., like many other garment factories, 
hires many power sewing machine operators throughout the year. The 
problem has been to reduce the turnover resulting from having to release 
unsatisfactory stitchers, and also to free foreladies from having to teach 
new employees how to operate a power sewing machine. In the past 
reliance was put on the statements of the applicants as to their skill and 
experience, This process frequently yielded poor workers and was ex- 
pensive in turnover and foreladies’ time. 

The problem at David Clark Company Inc., a subsidiary of Munsing- 
wear Inc., was to find out which of the numerous applicants claiming 
skill and experience could actually operate a power sewing machine on 
the lightweight cotton used in making brassieres and girdles. 


Design and Sampling 


Tn attempting to devise a trade test for stitchers pertinent previous 
work was carefully studied! and production line work was observed. 
The test that is now to be described is based on an hypothesis resulting 
from a combination of these two sources and validated on present em- 
f ployees. The four sections of the trade test as well as the trial sample 
were given to all of the power sewing machine operators in the plant. 
These operators were unselected as to level of skill or productivity except 
as experience itself is a selector. There were forty-nine operators, all 
female. Their experience ranged from two months to ten years and 
their productivity ranged from a barely acceptable amount of work to 
twice the acceptable amount of work. 

_ The plan was to correlate the results of the trade test with the super- 
Visors’ ratings and production records. If validated, the trade test 
could then be used to classify applicants on a proficiency scale. 


Criteria Selected 


Supervisors’ ratings were obtained from the two general supervisors of 
all of the operators. Both of these supervisors were qualified rate setters 
+See especially: Otis, J. L., The prediction of success in i hine 
e iL, power sewing mac! 
operating. J. appl. Psychol., 1938, 22, 350-366. Blum, M. L., Selection of sewing 
machine operators, J, appl. Psychol., 1943, 27, 35-40. 
436 


Trade Test for Power Sewing Machine Operators 437 


experienced in evaluating the speed and quality of the operators’ 
The ratings were obtained independently and before any test 
ere available. The speed and quality ratings were obtained for 
orker and combined into an overall rating. The product moment | 
tion of the two supervisors’ overall rating agreement was +.87 
Each operator was rated on the following five point scales for 
and quality: 
vi 
Speed Rating Scale Quality Rating Scale 
Very Fast Worker Highest Quality Work 
Faster Than Most Quality Better Than Most 
erage Speed Worker Average Quality Work 
ower Than Most Quality Poorer Than Most 
Very Slow Worker Very Poor Quality Work 


luction records were obtained for a one year period in order to 
e a further check on the trade test. The product moment correla- 


The Trade Test 

Pil Sample. The trial sample consisted of a single piece of light- 
ht nude cotton 534” by 734”. The subject is instructed to stitch 
the cloth approximately a quarter inch from the edge. 


Fia. 1. Worksample 1. 


438 f Edward Glanz 


1. Bias Tape on Cotton. The first worksample consisted of a piece 
of lightweight nude cotton 51” by 814" onto which three pieces of white 
cotton bias tape 614” by 14” are to be sewed. The subject is instructed 
to sew these strips onto the cotton cloth as straight as possible and to 
space them as well as possible without a ruler. 

2. Hemming Material. The second worksample consisted of a piece 
of lightweight nude cotton 534” by 734”. The subject is instructed to 
sew a double lap hem into the material, turning a quarter inch under 
each time. 


Fie. 2, Worksample 2. 


3. Stitching Between the Lines. The third worksample consisted of 
a piece of medium thickness cardboard 514” by 814” with a double lined 
pattern to follow. The subject is instructed to follow the pattern drawn 
on the cardboard stitching without thread between the lines without 
going outside the lines. 

: 4, Stitching on the Line? The fourth worksample consisted of a 
piece of medium thickness cardboard (same as in worksample number 
three) 574” by 814” with a single lined pattern to follow. The subject is 
instructed to follow the pattern drawn on the cardboard stitching without 
thread and without going off the line. 


2 The seeming similarity of Worksample numbers i erficial 
A eeu enti e aE re three and four is only sup 


440 Edward Glanz 


Administration of the Test 


The administration of the trade test is carried out by the use of 
shallow boxes lettered SAMPLE 1, 2, 3, 4. The material the subject is 
to sew with is placed inside the box along with an example done accurately 
and carefully so that the subject may see exactly what is to be done. 
Typewritten instructions on the inside bottom of the box tell the subject 
what todo. This method of utilizing both written and visual instruction 
insures that persons of all types may be tested accurately and also pro- 
vides a uniform administration procedure. 

The boxes are given to the subject one at a time and the administrator 
instructs the subject to do the task neatly and accurately, but also 
quickly. The same instructions are given to operators on the production 
line. The time of each operation is taken from the moment the box is 
placed on the sewing machine until the operator has trimmed the threads 
or until the machine is stopped on the cardboard. 


Scoring the Test 


The trade test is scored for both speed and quality and also for a 
combination of these which gives an overall (summed) stitching score.* 
Since each test is timed, the total time of each is added thus giving a speed 
score for the battery. The quality scoring is more complicated for many 
things must be taken into consideration. Worksample 1 and worksample 
2 were scored by setting distance limits for each operation and by counting 
incorrect stitches. In number 1 placement, parrallelism, and evenness 
were measured objectively with a ruler and the incorrect stitches counted. 
Handling, folding, and stitching were scored in the same way in number 2. 

The time needed to complete number 2 was also figured in the quality 
score by adding to the score for quickness and lowering the score for 
slowness, for the better stitchers can fold and handle the material more 
skillfully than the poorer operators. The quality scoring of worksamples 
3 and 4 was much simpler: the number of holes that were outside the 
lines in number 3 and the number of holes that did not touch in number 
4 gave the quality score. 

This scoring procedure is based on objective measurements and 
counting of stitches and thus can be carried on by others very easily.‘ 
ae ERTA combining these separate and distinct ratings into an overall 
ae CRO Bonieg are Tore or less opposed to G 

` t is, if the time or speed rating is high the quality rating is apt to be low and 

Vice-versa. However, an accurate picture of the stitcher’s ability cannot be obtained 

Pasties Yi) Combining these two scores because of the above fact, Of course 

tba eae: on the overall score is made up of the quality and speed rating 

fos das bak rrite: peasy hes awed test, as well as the scoring method, are available 
be developed by industrial users. 


Trade Test for Power Sewing Machine Operators 441 


Results 

ese results are clearly significant. The supervisors’ ratings yielded 
correlations probably because the supervisors knew the operators’ 
d could correct for the distortion that appeared in the production 
as explained in the footnote to Table 1. The two sets of correla- 


Table 1 
Trade Test and Criteria Correlations 


Correla- 
tions 
e Test and Supervisors’ Ratings: : 
Speed Scores ‘ 58 +.095 
Y pi uality Scores 5B =,099 
Overall Scores (Summed) 64 +.084 
e Test and Production Records: 
56 +.097 
1 BL +.129 
Overall Scores (Summed) 53 +.103 
Test and Combined Criteria:* 
Overall Scores and Combined Criteria (Summed) 67 £.078 


_ * Combined by means of Z scores and also weighted in a 2-1 ratio, supervisors’ 
to production records. This is because many operators of only mediocre skill 
accustomed to a single task and accumulate rather high earnings over a period 

This may also explain the somewhat lower correlations obtained with pro- 
records, 


and the combined criteria correlation do establish the trade test as 
n ment that will differentiate between the more highly skilled and 
ductive operators and the poorer and less productive operators. 


Summary 


trade test for power sewing machine operators provides a method 

Which the more highly skilled and productive operators may be 

tiated from the less skilled and less productive operators. 

ch a test should be useful in selecting applicants who have the skills 

in production and who, consequently will require a minimum of 
ob training. 


Prediction of Job Success from the Application Blank 
Willard A. Kerr 
Illinois Institute of Technology 
and 
H. L. Martin 
Radio Corporation of America 


While considerable factual information is contained on the typical 


industrial personnel application blank, little information is now available 
to indicate the actual value of this information for predicting the probable 
job success of the applicant. This study attempts to make a small con- 
tribution to existing knowledge on this topic by obtaining correlations 


status, possession of telephone, street address (i.e. part of city), age, 
birthplace (in or out of state), children, dependents, height, weight, 
previous employment with company, insurance, recent illness or opera- 
tions, number of personal references listed, organizations, hobbies, com- 
pany acquaintances, education, and previous positions for 244 employees 
in the personnel, engineering, purchasing, production control, phonograph 
record manufacturing, electronic tube manufacturing, and warehouse 
departments of the Indianapolis plant of the RCA Victor Division of 
Radio Corporation of America. 


Success on the job measures for these 244 employees were obtained 


from supervisors and the raw merit ratings (split half reliability of the 
merit rating form was found to be above .75) from each supervisor were 
transformed into standard dichotomous scores which: were then plotted 
with the information items to obtain tetrachoric coefficients of correlation 
between job success and these variables. These are presented in Table 1. 


Correlations significant at the five percent level or better are set in 
boldface type. 


On the basis of the highest correlations, eleven items were scored, 


check-list fashion, and the total scores were correlated with job success 
to obtain a coefficient of .35+.04. 


Although all these findings should be accepted as tentative, it is 


Possible that some of the findings may be found to apply to most depart- 
ments of work in general industry, Analysis of item predictive value for 


various types of work was not attempted here because of the limited 
number of cases. 


442 


between success on the job and such information items as sex, marital 


Prediction of Job Success from Application Blank 443 


Area B street address correlates positively with job success while 
ea A address correlates negatively; this is regarded as surprising since 


Table 1 
Correlations between Job Success Ratings and Personal History Items 
Female sex ; —.16 
Marital status: single —.18 
married 30 
divorced —.05 
Telephone number (possession of) .07 
Street address: Area A —.22 
Area B .23 
Area C AS 
Area D —.11 
Age .08 
ay _ Birthplace (in same state as plant) 15 
Number of children 00 
4 Number of dependents 00 
i Height of males —.12 
Height of females 05 
HN Weight of males —.27 
rs Former employee of same company 22 
Holds insurance policy 06 
Recent illness or operation -00 
Number of personal references listed sats bf 
Number of organizations in which membership is held .23 
Number of hobbies —.18 
Number of company acquaintances —.09 
Education: special training 1S 
college res 


Number of previous positions 


handicap for men. Former employment in the company seems to be 
an asset, but possession of insurance and recent illness or operation 
appear relatively unrelated. Listing of an excessive number of personal 

ferences, hobbies, or previous positions is negatively related with the 
Criterion, but membership in organizations and special education are 
Positively related. 

Tt should be emphasized that these correlations are low and the 
findings possibly may apply only to the workers measured. Nevertheless, 
is interesting to note that approximately ten per cent of the variance 


444 Willard A. Kerr and H. L. Martin 


in job success of these 244 employees is accounted for by ‘“autobio- 
graphical” factors reported in the original applications for employment. 
Such check-list autobiographical scores may make a highly useful addition 
to the total predictive test battery. Naturally they should not be 
weighted more heavily in determining selection than their relative con- 
tribution to determination of job success variance indicates. In order 
to maintain the validity of the autobiographical scoring key, it should 
be revised periodically according to results of routine revalidations. 

Better results may be obtained in selecting for a specific job with this 
device than when using it to hire for the entire plant. Manson (1), for 
example, found a coefficient of correlation of .40 between the weighted 
scores on an application blank and the production records of life insurance 
salesmen, and Ohmann (2) obtained a correlation of .67 between his blank 
and the earnings of paint salesmen. 


Summary 


1. Most of the items on a typical industrial personnel application 
blank are easily quantified in check-list fashion on the basis of a previous 
item validation study against a job success criterion. 

2. In this study, when the original applications of 244 employees were 
scored check-list (unweighted) fashion with a validated key, the check- 
list raw scores were found to correlate .35 with the supervisory merit 
ratings of job success. 

8. Since in this study the application blank accounts for approxi- 
mately ten per cent of the variance in job success of an extremely hetero- 
geneous (almost “run of the employment office”) group of employees, it 
Seems reasonable that the application blank or a systematic autobio- 
graphical inventory should become a standard part of the psychometric 
battery in industry. 

i 4. In view of the facts that background factors change in predictive 
significance With time and their significance also is altered by changes in 
the business cycle, the industrial psychologist should revalidate such an 
instrument periodically. 

5. When validation keys are developed for specific kinds of employees 
or job families, more substantial correlations are likely to be obtained 
both with the job success criterion and the tenure criterion. 

Received January 12, 1949. ' 
References 
ty ‘ear G. E. What the application blank can tell. J. Personnel Res., 1925, 4 


2. Ohmann, O, A. A report on selection of sal j 
lesmen at the Tremco Manufacturing 
Company. J. appl. Psychol., 1941, 25, 18-29. 


Tests Used by United States Air Carriers 


Nicholas C. Feronte 
Marquette University 


The use of tests in the selection and promotion of employees is coming 
increasing favor with United States industry. The first marked 
in such tests resulted from their use by the United States Army 
orld War I. The extensive and successful use of tests by our armed 
in World War II has given added impetus to the use of tests in 
In some areas personnel testing has aroused an almost 
momenal interest among business executives. 
A survey of the various psychological tests used by United States Air 
rriers (Scheduled and Certificated) is provided by the results of a 
ionnaire received from 24 companies. Cooperation in replying to 
the questionnaire was very gratifying and personnel directors of many 
of the companies indicated they would like to receive a copy of the 
“results: A questionnaire was sent on July 14, 1948, to all companies 
sted 1 asking the personnel director of each company to indicate which 
were most useful in selecting and promoting personnel within his 
company. 
the 24 companies that returned questionnaires 13 indicated ex- 
e use of tests, 2 use tests only to select pilots, 2 stated they had not 
un operation, 1 disclosed tests were not used and would not be used 
til there are at least ten applicants per job, and 6 indicated they were 
organizations, hence depended upon the ability of management 
determine the qualifications of those seeking employment. However, 
y have felt the need of using more scientific methods, 
"The questionnaire listed 50 tests selected from the article published 
Louttit and Browne? The tests were chosen on the basis of published 
s pertaining to the validity and reliability of each. | Further 
dence of suitability based on the author’s use of the tests and the 
nents found in Buros’ book.’ 
‘American Aviation Directory, Spring-Summer 1947, American Aviation Publica- 
Washington, D. C. It lists thirty-six companies; however, five are operated by 
nies included in the mentioned list. 2 
*Louttit, C. M., and Browne, C. G. The use of psychometric instruments in psy- 


c clinics. J. consult. Psychol., 1947, 11, 49-54. i 
*Buros, O. K. The Nineteen Forty mental measurements yearbook. Highland Park, 


r Jersey, The Mental Measurements Yearbook, 1941. 
7 . 445 


446 Nicholas C. Feronte 


Table 1 
‘Testa Reported as Being Used Most Often 


Number of Times 
Listed 


me eS se t t 


~ 


Oe ee 


eer wre 


eo 


Tests Used by United States Air Carriers 447 


y those tests reported as being used are listed. See Table 1. 

spaces were provided on the questionnaire and the personnel 

r was asked to list such tests in use by his company which did 

on my questionnaire. This instruction produced a rather 

list of additional tests. Other instructions on the questionnaire 

follows: 

How long have you used tests? 

Tf you have used any tests and have abandoned the practice, will 
e name the test. ` 

the selection or promotion of what types of employees have you 

its? 


administers your test selection and administration program? 


Table 2 
Type of Employees Selected or Promoted by Tests 
Number of Times 
Type of Employees Listed 
Apprentice 8 
Semi-skilled workers 7 
Salesmen 6 
Clerical employees 10 
Unskilled workers 5 
Navigators 1 
Skilled workers 8 
Pilots 8 
Foremen and/or supervisors 6 
Executives 2 
Sales agents 1 
Flight engineers 1 
Stewardess 2 


n examination of the results in Table 1 indicates that the stand- 
tests used most frequently are the Otis, Self Administering, 

(Wonderlic), Minnesota Multiphasic Personality Inventory, 
Wadsworth Temperament, Bennett Mechanical Comprehension, 
Clerical Aptitude Test, Stenquist Mechanical Aptitude, and 
ta Paper Form Board. The self developed tests listed most 
e typing, stenography, and memory test. Practically all com- 
es reporting administer an intelligence test and also a clerical test, 
tab e 2 lists the type of employees selected or promoted by use of 
is. To date, little use is made of tests to select or promote sales agents. 
neral tests are used mostly to select office clerks, pilots, apprentices, 
workers, and semi-skilled workers. 


448 Nicholas C. Feronte 


The survey discloses that a few companies inaugurated testing only 
two years ago, while others reported having used tests to select and 
promote personnel for the last ten years. It is interesting to learn from 
the questionnaire that no company discontinued administering tests 
permanently once it began to use them. 

In replying to the question who administers your test selection and 
administration program? Six listed the employment manager, four 
delegated the assignment to the personnel clerk, three indicated the 
Tesponsibility was assumed by the personnel manager, and one stated a 
clinical psychologist is employed full time to administer tests. 

Perhaps the most interesting implication of this survey is that in 
general all air carrier companies are either using tests or have felt the 
need of using scientific methods for selecting and promoting personnel. 
Of equal significance is the fact that once a company inaugurated a 
testing program it was never halted permanently by management. The 
various company’s interest in tests is evidenced by the use of a diversity 
and variety of not only well known standardized psychological tests but 
also company developed tests. 

In general, companies reported using tests that have been found to 
be valid and reliable. 

Tt would be gratifying to flight passengers to learn that the survey 
disclosed psychological instruments were used most often, in addition to 
selecting clerical employees, to select personnel definitely responsible for 
the maintenance and the flight operation of the airplane. 


Received February 18, 1949. 


ie 


A Factor Study of Worker Characteristics * 


Nathan Jaspen 
Pennsylvania State College 


Tn order to make the relationships between occupations more under- 
dable, the Occupational Analysis Division of the United States 
ployment Service has selected the most significant job characteristics, 
assembled them into a rating form adapted after Viteles’ Job Psycho- 
(15, 16). This rating form has been applied to several thousand 
pations. Estimates of the most significant worker characteristics 
required for each occupation are made independently by several trained 
sts. If, for example, an assembly job demands an unusual amount 
finger dexterity, the analyst indicates that dexterity of fingers is an 
ant worker characteristic for this occupation. This information 
is punched on Speed Sort cards, which can then be sorted so that the 
upations which have various characteristics in common can be studied 
the relationship between them noted. The traits included in the 
rker Characteristics Form are listed in Table 1. 
The Worker Characteristics Form includes 45 traits or abilities which 
y be needed by the worker to do the job. A large number of important 
ob Families,” containing lists of occupations related to a single occupa- 
n or to a limited number of selected occupations, have been established 
n the basis of worker characteristics. These have had various uses: 
O select workers for critical occupations from related occupations; to 
er workers from occupations in which there were labor surpluses; 
upgrade employees on the job; and, to show the civilian occupations 
ited to military occupations (8, p. 703). Nevertheless, it is obvious 
the usefulness of the Form for some purposes would be increased if 
ontained a smaller number of independent traits. This pilot study 
undertaken to determine what basic factors were being measured 
* This paper is an abridgment of a master’s thesis completed at the George Washing- 
niversity in 1944 under Dr. Thelma Hunt, chairman of the thesis committee. 
dy was done in 1943-44 when the author was on the staff of the Occupational 


Division of the United States Employment Service (then a part of the War 
. Acknowledgment is made to Dr. Carroll L. Shartle (now at 


State University) and Dr. Beatrice J. Dvorak for permission to use data in the 
es of the Occupational Analysis Division of the United States Employment Service; 


the George Washington University and the United States Employment Service 
is also made to Dr. Marion 


449 


450 Nathan Jaspen 


Table 1 
Proportional Frequency with Which Traits in the Worker Characteristics Form Are 
Rated as Required in Significant Degree in 275 Selected Occupations in the 
Skilled, Semiskilled, and Unskilled Categories of Occupations 


Characteristic Required of Worker Per Cent 
1.* Work rapidly for long periods 17 
2.* Strength of hands 37 
3.* Strength of arms 47 
4.* Strength of back 27 
5.* Strength of legs 11 
6.* Dexterity of fingers 24 
7.* Dexterity of hands and arms 48 
8. Dexterity of foot and leg 05 
9.* Eye-hand coordination 52 
10. Foot-hand-eye coordination 05 
11.* Coordination of independent movements of both hands 11 
12.* Estimate size of objects 11 
13. Estimate quantity of objects 06 
14.* Perceive form of objects 21 
15. Estimate speed of moving objects 04 
16.* Keenness of vision 25 
17. Keenness of hearing 01 
18. Sense of smell 01 
19. Sense of taste Ly 
20. Touch discrimination 07 
21.* “Muscular” discrimination 15 
22.* Memory for details (things) 16 
23. Memory for ideas (abstract) 04 
24. Memory for oral directions 04 
25. Memory for written directions 02 
26. Arithmetic computation 04 
27.* Intelligence 09 
28. Adaptabili 04 
29. Ability to make decisions 08 
30. Ability to plan 08 
31. Initiative 05 
32.* Understanding of mechanical devices 15 
83.* Attention to Many items 16 
Oral expression 01 
35. Skill in written expression = 
36. Tact in dealing with people 04 
+ Memory of names and Persons ” 
38. Personal appearance o1 
39. Concentration amidst distractions 03 


A Factor Study of Worker Characteristics 451 


Table 1 (Continued) 


Characteristic Required of Worker Per Cent 


= 41.* Work under hazardous conditions 24 
42.* Estimate quality of objects 09 
43.* Work under unpleasant physical conditions 29 
44, Color discrimination 07 
_ 45, Ability to meet and deal with public 02 


Added Characteristics 
46.* Tools used 68 
47.* Knowledge of graphic instructions required 17 
Per Cent 
* Skill Level: Skilled 38 
Semiskilled 41 


Unskilled 21 


* Characteristics which are included in this study. 
™ Less than one per cent. 


by the Worker Characteristics Form. As it developed, only 20 of the 
_ 46 traits included in the Form were included in this study, so the factors 
1 discovered have reference only to these 20 traits and not to the Form as 
a whole. However, this is not a serious restriction, as none of the re- 
maining 25 traits was present in significant amount in as many as 10% 
of a sample of the occupations so far studied. 


Description of the Data 


_ The Worker Characteristics Form provides for estimates of 45 traits, 

an A, B,C, or O amount. The amounts designated by these letters are 
(10, p. 176-178): A. A very great amount of the trait, such as would be 
ossessed by not more than 2 per cent of the general population; B. A 
distinctly above-average amount of the trait, less than that designated 
_by A but more than that designated by C; C. An amount of the trait less 
lan that possessed by the highest 30 per cent of the general popula- 
jand 0. The trait is not required for the job. 


job with people in 
The estimates are 


e job, 
Tf th 


452 Nathan Jaspen 


‘The analyst is guided in his understanding of the meaning of the traits by 
a manual embodying comprehensive definitions of the traits and examples of 
the different quantities of each trait (3). Several, usually ten, separate esti- 
mates are submitted by as many trained analysts who make their observations 
and estimates in different plants and states. The reliability of these estimates 
is not known, at least by the present writer. The information is collated by 
an analyst at headquarters, and reviewed by an analyst of higher grade. The 
final ratings are pues on i oo Sort cards, one for each occupation, in two 
amounts: A or B on the one d, for traits which are present in significant 
amounts, and C or O, on the other, for traits which are not present in significant 
amounts. The frequency with which these traits were rated as significant in 
the study sample is shown (expressed in percentages) in Table 1. 
_ , About 9000 Speed Sort cards have been punched up to 1943. All kinds of 
jobs are included: professional, managerial, technical, service, sales, clerical, 
agricultural, skilled, semiskilled, and unskilled occupations. The present study 
been restricted to the categories of skilled, semiskilled, and unskilled 


pope. 
Speed Sort cards are arranged in sequence by occupational code. The 
skilled occupations are the 4-00 to 5-99 series; the semiskilled are the 6-00 to 
7-99 series; and the unskilled are the 8-00 to 9-99 series. Two cards were 
selected at random from each centile group from 4-00 to 9-99. This would 
have yielded 1200 cards if the file had had sufficient cards in each centile code; 
but in many cases a centile code was open, or no studies had been conducted of 
occupations within the centile code, or only one study had been conducted, in 
which case only one card was selected. Two hundred and seventy-five Speed 
cards ihe ip beste this KENE Sie about 7500. s d 
occupatio; les represent different occupations, and no account is 
taken of the relative number of workers in each occupation. Skilled jobs are 
more differentiated than unskilled jobs. There is a certain bias in the 
occupations selected by the United States Employment Service for study; the 
cooperativeness of the different industries, and the matter of geography, are 
only a few of the variable factors. No one knows how representative the 7500 
8 Sort are of skilled, semiskilled, and unskilled occupations in the 
ia an RAELA yen Eeerpe ee may be charitably regarded as : 
0 e ication structure of a major part of the Dictionary o0 
Occupational Titles (14). mn 


Procedure 


Obtaining the Correlations. Since each characteristic, as punched on 
eae cards, was either present or not present, it was feasible to 
tetrachoric correlation coefficients between pairs of charac- 

teristics. Thurstone recommends that the coefficient not be computed 
at all if one of the tail areas is less than 10 per cent of the total population 
(1). Consequently, all characteristics which were significantly present in 
less than 10 per cent, approximately, of the sample of jobs in the present 
Were eliminated, since valid correlations for them could not be 
computed. Only 20 of the worker characteristics survived this step. 
In addition, the variables (46) “Tools used,” and (47) “Knowledge of 
graphie instructions Tequired,” remained. The twenty-third variable 


Skill was expressed in three categories—skilled, semiskilled, and un- 


3 “yaied peunoap oy ayeunuro 0} 29pz0 ur QOT Aq peydrMUT weeq OAvy SITE [IV e 


6g oF 
z0- 1 er 
s% s% H- a 
60 zi Ww 9- 1 
æ z- 0 OL 80 ee 
ze o sI- S- ze 
œ s% 9 6 tl # z 
we 00 0 æ IT OL 6 oF zz 
st w 8 & z-o- u o% W- 1g 
S zæ o s- e O- 9 z 8 80 $ 91 
5 D S g-g o-w g WH S E WI 
Š Gf 6 -æ Or 8 G- 00 FH % ee 9 at 
È oc o of g-u- ge grs K LOW u 
© æ 4 6I- LN- 2-80 80 FI- 00 6 SF z KH LO 6 
=> 98 ře 10 FI- 60 20-20 SI-80 $ 9% 2 OL 9% T8 L 
S æ æ æ- o- -g Sf w- O ff S OL WH @ SE 9 
A 10 et 80 zg- »¥ Il GO—- BO %0- ZI- Z— 00 9 9I— 9— 80— Iz- s 
$ zo— 12 ¥ SI- HK W g0 90- GI- OI— IZ— Z0- %0- SI— +I- SI— 6- 16 + 
S 0 æ æ si- 9 OF SI- T0- I S0— I W- ZI %0 & OF- tL 88 e 
=~ Z E T 0 Z W H -UA Z W- O Z KH 21W- OF BW z 
9%— Zz— FI— 20 zZz- 80 Z0- FI—- Zř— OC— £0 1z- OI- SI ST 00 o 00 00 %0- zz I 
we St go- oi 90 OF LE š If & BM B æ W@W HW BW BS 2 SI- %90- Z0- 4- IPS 


Ly OF (ca Sed St ae TO ey ee es en eae Se J ete et Se My 


454 Nathan Jaspen 


skilled—and was determined from the occupational code of each of the 
275 occupations in the study sample. The intercorrelations between 
skill and the other 22 variables were found from 2X3-fold tables with 
unequally spaced intervals. Normal distributions and rectangular re- 
gression were assumed, each interval was assigned the value of its mean 
expressed as a deviate on a unit normal curve, and Pearson product- 
moment r’s were computed. A correction for “broad categories” was 
then applied (4, pp. 167-171; 7, pp. 399-402). 

The matrix of intercorrelations is shown in Table 2. 

The Factor Analysis Procedure. Eight centroid factors were extracted 
from the matrix of intercorrelations, by Thurstone’s Centroid method 
(13). The centroid matrix is shown in Table 3. Estimates of the com- 
munality were used in the diagonal cells of the correlation and the 
residual matrices (13, p. 89), the estimates being the highest coefficients 
in each column. In the successive extractions the characteristics were 


Table 3 


The Centroid Matrix Fe Projections of Worker Characteristics Vectors on 
8 Arbitrary Orthogonal Axes Determined by the Centroid Method * 


I is TT LY, V V vi vu h? 
Skill 6 83 | Bl 27 12 23 -04 «=-18 78 
1 OS a337 Sio | Lil 13 49 
2 58 —45 —34 18 —22 -19 —19 26 88 
3 45 -78 -23 08 -25 -13 -1l 07 97 
4 23 -81 -16 -09 -39 14 16 —12 95 
5 2 -71 -17 -10 37 38 09-12 93 
6 28 63 -37 12 —06 27 18 —08 74 
7 53 18-41 39 38 —16 07 —03 79 
9 63 38 -59 — 32 30 07 07 -20 98 
lL 4% 82 -2 51 08 08 -12 —i5 69 
12 46 ll -16 ~42 18 15 —40 09 65 
14 66 25 -29 -32 19 16 —12 26 83 
16 > bt S T E T EY 63 
21 47 18 -22 -30 20 -39 1810 63 
22 49 21 60 l4 —18 22 —15 22 82 
27 33 7 50 -06 -13 10 17 —21 49 
82 45 23 28 05 -30 -13 36 11 58 
33 pe ee O B0 17 -ieii 28, 88 
41 21 —54 36 15 30 —10 -24 -13 66 
42 a a3 is gq | go 28 58 
p y Re ea Ea 7 L29 68 
= -16 - a 55 

fiz ay iS RREY ll 30 24 


19 17 ll 13 31 36 68 


* All entries have been multiplied by 100 in order to eliminate the decimal point. 


A Factor Study of Worker Characteristics 455 


Table 4 
The Transformation Matrix A * 


A B c D E F G H 


All entries have been multiplied by 100 in order to eliminate the decimal point. 


Table 5 
Rotated Factorial Matrix V = F.A * 


A B Cc D E F G H 


SRLSALSELLSA 
S 
ao 
z 


=I 37 —05 —03 52 -29 


* All entries have been multiplied by 100 in order to eliminate the decimal point. 


cted by a method which maximized the amount of the total variance 
was accounted for by each new factor (13, pp. 99-100). 

ter eight extractions, the extraction process was discontinued be- 
use some communalities appeared to be spuriously great, and in fact 


Nathan Jaspen 


the ninth centroid factor would have increased two of the communalities 
to more than unity. Neither Coomb’s criterion (2) nor McNemar's 
criterion (6) for the number of factors appeared applicable. ‘This may 
have been because the scores in this study were based on judgment rather 
than tosta. 

The arbitrary axes of the centroid matrix were then rotated by 
‘Thurstone's method of extended vectors (11) until a solution was ob» 
tained which for the most part satisfies the requirements of simple 
structure. The transformation matrix A is shown in Table 4, and the 
rotated factorial matrix is shown in Table 5. An effort was also made to 
keep the correlations between the primary factors as close to zero as 
posible. ‘The range of these correlations is from .16 to —.14 


Interpretation of Factors 


The significant factor loadings are here considered to be those of 40 
above, Thurstone notes that “the naming of a factor cannot be 
With confidence unless the projections are as large as .50 or .00 so 
that the factor accounts for a fourth or a third of the variance of a test” 
p- 79) or measure. Loadings below .20 are of no significance. 
| O a narge arene lant 
z Factor A. The characteristics which enter significantly into Factor 
4 rr their loadings, are: 4. Strength of back, .01; 5. Strength of logs, 
83; and 2. Strength of hands, .62. 
Factor A has been designated as Strength. All of the loadings are 
phenomenally high, ‘The intercorrelations also were very high, indicating 
the correspondence between the traits, 
Fodor B. The following characteristics have significant loadings oñ 
BiS, Attention to many items, 88; 22. Memory for details (things), 
: «77; 27. Intelligence, .57; 32. Understanding of mechanical devices, 56; 
AO; and 47. Graphic instructions, .37. 


industrial jobs of a semi-clerical character. 
C. Fastar © has the following loading: 12. Estimate size of 
objects, .64; 16. Keen of vision, 
Skil level, 52; 42. Estimate quality of objects, Aland 21. “Museu 


Factor C appears to be an Inspection factor, perhaps predominantly 
visual inspection. C BI refers to kinesthetic sensitivity, ad 
ised et ir ly neh ohio we 


à 
= 
i 
3 


A Factor Study of Worker Characteristics 457 


or D. Factor D has significant loadings as follows: 43, Work 

unpleasant physical conditions, .75; 41, Work under hazardous 

.70; and 6. Dexterity of fingors, — 36. 

his factor, which may be nothing more than a doublet, has been 

nated Physically Unpleasant Working Conditions, The relatively 
mative loading of Characteristic 6 may be a sampling error, or 


adingly physical, 

E. The following significant loadings appear in Factor B: 9, 
-hand coordination, .91; 7. Dexterity of hands and arms, .78; 11. 
rd on of independent movements of both hands, .78; 6, Dex- 
ly of fingers, .00; and 1, Work rapidly for long periods, .30. 

factor appears to be primarily Manual Dexterity, Eye-hand 
on has the highest loading on this factor, but the distinetion 
n this characteristic and Characteristic 7 is not entirely clear, 
0 tion between the two characteristics is 51, The example 
in the manual on Worker Characteristics for an A amount of 
a tio 7 is Drill-Press Operator (3, p. 16); and for Characteristic 
fe examplo cited is Engraver, Hand IV (3, p. 19), A case might be 
ado for the interchangeability of the two examples. Whether the two 
Aract cs are substantially the same is not, of course, established 
' this study. The fact of the relationship as here measured is merely 


PF. Factor F has the following significant loadings: 2. Strength 
pds, .52; 42. Estimate quality of objecta, 48; 3. Strength of arma, 
‘Bl. “Muscular” discrimination, 37; and 32. Understanding of 
Anical devices, „34. 

It be not believed that this factor is paychologieally meaningful as it 
€ “Had enough care been taken it is pomible that the factor would 
S converged with the A and C planes, Perhaps Characteristics 2 
d Characteristics 21 and 42 might have been established as indo- 
factors. 


O. Factor G has the following significant loadings: 46, 
a instructions required, 52; 


458 Nathan Jaspen 


p. 165). In this instance, it is not known whether what is indicated is 
that workers with a fund of mechanical information do not work rapidly 
or that they do not work for long periods of time at a stretch. 

Factor H. Factor H is a residual factor, with no significant loadings 
and apparently without psychological meaning: 16. Keenness of vision, 
.39; 21. “Muscular” discrimination, .30; and 27. Intelligence, .30. 


Summary 


The 20 Worker Characteristics, together with Skill and two other job 
characteristics, have been reduced to six meaningful factors: Strength, 
Intelligence, Inspection, Physically Unpleasant Working Conditions, 
Manual Dexterity, and Mechanical Information. This economy has 
been effected at the cost of a certain loss in specificity. Whether factor 
scores for the 275 occupations would be as valuable as a larger number of 
more specific scores depends on the use to which the information is to be 
put; just as the information contained in the Worker Characteristics 
Form is more and less valuable in various respects than the extended 
information in a voluminous job analysis. Certainly for the purpose of 
establishing a limited number (less than fifty) of occupational fields 
distinguished on the basis of worker characteristics for use in counseling, 
six factors are perhaps as many as can be considered. In this event it 
becomes important to find six independent and fundamental factors. 

Such factors may be sufficient for the broad mass of industrial jobs. 
At the professional level they would not; there it would certainly be 
important to break up Intelligence at least into the verbal, numerical, 
and spatial factors commonly found in the literature. 

A not inconsequential finding is that of the Strength factor. Aptitude 
tests for occupational selection do not ordinarily include a strength test, 
Nor is it important that they should. But for many jobs at least a coarse 
evaluation of the strength of the applicant should be made in conjunction 
with and prior to test administration. There is little point in testing the 
manual dexterity of a physically weak applicant for a job which requires 
both strength and manual dexterity. 

Received January 8, 1949. 


References 


1. Chesire, L., Safir, M., and Thurstone, L, L. Computing diagrams for the tetrachoric 
es eens i str aa University of Chicago Bookstore, 1933. ki 
„H. A criterion for signi: i ry 

ong re lor significant common factor variance. PsychometriX 
3. Employment Office Service Division. Rati istic bing- 
) ing of worker characteristics. Washing 

pe tal ys Manpower Commission, 1943. Multilithed. 

- Kelley, T. L. Statistical method. New York: Macmillan, 1924. 


A Factor Study of Worker Characteristics 459 


andahl, H. D. Centroid orthogonal transformations. Psychometrika, 1938, 3, 
219-223. 

McNemar, Q. On the number of factors. Psychometrika, 1942, 7, 9-18. 
eters C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 
- bases. New York: McGraw-Hill, 1940. 

e, C. L., Dvorak, B. J., and Associates. Occupational analysis activities in 
the War wba E a, Psychol. Bull., 1943, 40, 701-713. 

Stead, W. H., and Masincup, W. E. The occupational research program of the United 
States Employment Service. Chicago: Public Administration Service: 1941. 
Stead, W. H., and Shartle, C. L. Occupational counseling techniques. New York: 
American Book Company, 1940. 

‘Thurstone, L. L. A new rotational method in factor analysis. Psychometrika, 

_ 1938, 3, 199-218, 

Thurstone, L. L. Primary mental abilities. Chicago: University of Chicago Press, 

1938. 

aurstone, L, L. The vectors of mind. Chicago: University of Chicago Press, 1935. 

U.S. Employment Service, U. S. Department of Labor. Dictionary of occupational 

_ titles, Washington: U. S. Government Printing Office, 1939. 

Viteles, M.S. Job specifications and diagnostic tests of job competency designed 

= for the auditing division of a street railway company. Psychol. Clinic, 1922, 14, 
— 88-105. 

Viteles, M. S. Industrial psychology. New York: Norton, 1932, 


Reported and Demonstrated Values of Vocational Counseling 


Rose G. Anderson 
The Psychological Corporation, New York City 


The question most, frequently asked by individuals considering voca- 
tional counseling is, “How successful is such counseling?” Consultants 
also have a vested interest in the answers to the questions, “What are the 
values which the individual feels he has gained from the counseling 
process?”; “What are the demonstrable practical outcomes of such 
counseling?””, 

There is a dearth of scientific evidence bearing on these questions. 
Much of the content of popular articles on the subject tends to mislead 
rather than inform the public as to the aims of qualified counselors and 
as to the justifiable expectations of the individual counseled. 

Careful investigation in these areas is important because of the in- 
creased public interest and professional activity in vocational counseling. 
Tt is also essential for the contributions which may be made to improve- 
ments in counseling procedures. 

Recognition of the desirability for such research is reflected in in- 
vestigations which have been initiated and reported by Veterans’ Advise- 
ment Centers (1, 2, 3). 

Two independent but related counseling projects afforded the writer 
the opportunity to investigate: (1) the evaluations of their counseling 
experience by a group of civilian industrial employees; and (2) the prac- 
tical outcomes of counseling for a group of ex-Service men. 

The first of these projects was initiated by the Woodward Governor 
Company during World War II in the interests of the post-war re- 
adjustments of temporary employees and in the interests of the most 
effective use of the employees’ skills. Comprehensive vocational coun- 
seling by a unit? directed by the writer was made available to all em- 
ployees at company expense. A total of 1184 (85 per cent) of plant 
Personnel at all occupational and executive levels availed themselves of 
this opportunity. 

The counseling was completed the week of V-J Day. Prior to this 
Period, a follow-up questionnaire designed to evaluate the benefits of the 
counseling experience had been circulated to 1086 counselees, ‘671 men 
and 415 women. The number of returns was limited by the dropping 


1 The full-time counselors were: Olive Bra; : ee New- 
Miss 7 3 y, Ralph Filburn, and William Van Ne 
ae Bray supervised the tabulation and analyses of the questionnaire returns 


460 


Values of Vocational Counseling 461 


00 employees the Monday after V-J Day, and by the fact that a con- 
‘able number of employees who had previously left the plant could 
reached. In addition, 51 men who had been counseled just before 
duction into the Services were not sent questionnaires. 
A total of 685 returns was received out of 1086 questionnaires dis- 
buted, a return of 62.1 per cent for men, 64.6 per cent for women, 
er cent for the total group. 
cerpts are quoted from the statement accompanying the question- 
“The only interest in the returns is in getting a frank unbiased 
ression of opinion from each member as to what the counseling has 
meant to him or her. The counselors are not asking for bouquets, 
although they don’t object to them. They are looking for any clews 
they can get as to whether, and if so how, the service can be made more 
ul.” The following statement was included from the company 
ident: “. . . At the present time, Rockford industrial, commercial 
agricultural leaders are seriously considering a vocational counseling 
up for returning service men from this area. You will help these men 
a more intelligent decision in regard to such a program, and you 
make a contribution to those who are developing and improving 
ng procedures if you will give them the benefit of your ex- 
3 .. . “You will notice that a number appears in the space 
‘Name. This is your code number known only to your counselors. 
ou need not sign your name unless you wish to.” .. . “The manage- 
at will not read individual questionnaires and is not interested in who 
Says what, but in the trend of results.” 
_ It would appear that the accompanying statements encouraged critical 


ae Counseling Procedures 


7 JA reliminary description of counseling procedures is pertinent to the 
i ntation of results. 


Vocational counseling as practised in this project involved the integration 
comprehensive neared of aptitudes and interests with: (1) detailed infor- 
ition about the individual’s personal, educational, and vocational back- 
ound; (2) the evaluation of personality characteristics and emotional adjust- 
nent through both interviews and objective measures; (3) available resources 
taining or avocational expression. The resulting integration was then 
in helping the counselee to assimilate the interpretations evolving from 
study, to arrive at decisions and implement them through appropriate 
asures. Q 
first step in setting up the project was a survey of community resources, 
8 was Ags Eileri by Dr. Ruth Cavan, a University of Chicago 
logist resident in Rockford. Dr. Cavan or her assistants personally 
viewed the representatives of all institutions and agencies which offered 
onal, avocational, and vocational programs. All pertinent information 


462 Rose G. Anderson 


with respect to their offerings was compiled in a cross-referenced file for the use 
of the counselors. Information about educational resources outside of the city 
supplemented this file. : 

A second preliminary project was the assembling of a loan library in the 
plant. This included a comprehensive range of sources of vocational informa- 
tion, selected books on personal adjustment, child guidance, and adolescent 
psychology. 

The plant project was initiated through orientation talks attended by all 
plant personnel. In these the procedures, aims and possible outcomes of 
counseling were discussed. Subsequent to this a common battery of tests was 
administered to all plant personnel in groups of 50-100. This served the dual 
purpose of introducing the testing program and of developing general and 
differential plant norms for the tests used. The preliminary battery of tests 
included the Adult Placement Test (a mental alertness test with verbal and 
numerical sections), the Psychological Corporation General Clerical Test, 
and the Revised Minnesota Paper Form Board Test. The Allport-Vernon 
Study of Values was used as an initial approach to interest trends. This 
measure was used with the e: tation that it would be less applicable to those 
with limited educational backgrounds. Contrary to expectations, uniformly 
active interest was demonstrated by all. R 

After the completion of this testing schedule, each employee made his own 
decision as to whether he wished to continue with the comprehensive counseling. 
Individual interview appointments were then scheduled. 

Prior to the individual interview, the counselor assembled the above test 
results, and the information in the plant personnel files. The latter included: 
D a comprehensive application questionnaire covering personal and family 

ata, education and recreational activities; (2) previous employment references; 
(8) a plant employment record including periodic supervisors’ evaluations and 
ratings, salary increases, transfers and promotions; (4) results of medical and 
visual examinations, and (5) the results of objective tests (chiefly trade infor- 
mation and arithmetic) administered as part of the application procedures. 

In the first individual interview, the counselor corroborated and supple- 
mented the above types of information, explored the counselee’s attitudes 
toward his current position and job transfers, his educational objectives, and 
his social and personal relationships. ; 

Two most important aspects of this interview were the establishing of a 
favorable relationship with the counselee and the opportunity to develop 
insights into his personality organization. At the termination of this first 
interview, the counselor asked the counselee to fill out the Bernreuter Per- 
sonality Inventory, indicating it would be helpful in supplementing the inter- 
view. This was returned directly to the counselor. This practise was found 
to ha essential if this instrument was to contribute any value. 

n the basis of the interview and the compiled information, a supple- 
ae appropriate testing battery was scheduled and individually admins 
babel psychometrist supervised by one of the counselors. The additiona 

eri chosen from a wide range of tests including intelligence tests, 
ent nical, artistic, musical, scientific, language, and clerical aptitude tests 
acl east tests, and additional interest inventories. 
basis i jubsequent interview was scheduled when the counselor felt he had G 

or arriving at interpretations and recommendations. The integrity, © 


the counselors in respecting the c li 
ce jonfidences of the employees led to counseling 

on intimate personal i ploy for 

farther ASTONAR erat in many instances. Employees freely asked | 


problems, The plant physician cooperated actively on medic# 


The maximum weekly 


‘Thd siihe proses caseload for each counselor was eight individuals. 


were followed in working with the veterans wit the 


Values of Vocational Counseling 463 


n of greater dependence on the interviews for personal and background 
of the counselors had no previous experience in vocational counseling, 
both had experience in applied psychology. One held the MA 
yin psychology, the other had Pe cate his course requirements for the 
D. degree in psychology. The third counselor held a Master’s degree in 
ocational education and had had extensive experience in vocational education, 
ducational guidance, and in industry both as an employee and a consultant. 
he first months of the project constituted an in-job-training program for 
counselors under the writer’s guidance through review of cases and staff 


erences. Such guidance was continued in periodic visits throughout the 
od of both projects. 


page. 

As a result of your counseling did you get a better idea 

a, Of your strongest abilities?... 0.2.0.2. seceesee eens Yes No Doubtful 
. Of your less strong abilities?...........22eeeeese eee Yes No Doubtful 
Of your personality traits in relation to fields of work? Yes No Doubtful 
d. Of ways of promoting your personal development?.... Yes No Doubtful 
Of other fields of work or training you might transfer to? Yes No Doubtful 
your counseling report influence your decisions as to 


future jobs or training?...........2..--cee eee ee cece es Yes No Doubtful 
Has your counseling report already influenced your decision 
to a job or a field of training?..........+-.0e++sees Yes No Doubtful 


f so, was the decision in accord with the results of your 
... Yes No Doubtful 


Yes No Doubtful 
Yes No Doubtful 


jid you have ambitions which your test 
pport 

For particular jobs? 
b, Ingeneral?....... 


Yes No Doubtful 
Yes No Doubtful 


your counseling 
“4 Increase your general self-confidence?......-++-++--+ Yes No Doubtful 
b. Decrease your general self-confidence?.......--++++++ Yes No Doubtful 


id your counseling on the whole give you a better under- 


Yes No Doubtful 
Yes No Doubtful 


Do you recommend that Woodward Governor continue 
Meisuch counseling?.............eeseccevceesecteeerrees Yes No Doubtful 


‘ould you recommend it to others at their own expense? Yes No Doubtful 
Comment: (In the space below answer the question and make any further 
comments you would like to make. You may use the reverse side 
of the paper if you wish.) y 
age do you think such counseling should be provided? —  v----v+.+---++----- years, 
Fic. 1. Questionnaire used in counseling follow-up study. 


464 Rose G. Anderson 
General Results 


The complete questionnaire is reproduced in Figure 1. 

The analysis of the replies to the different items for men and women 
separately and combined, and for the plant employees and those who 
had left the plant are presented in Table 1. 


Table 1 
Questionnaire Returns from Individuals Counseled 
Men Left Women Left 
in Plant Plant in Plant Plant Total 
No, Capes 366 51 171 97 685 
% % % % % % Doubt- No 
Yes Yes Yes Yes Yes No ful Reply 
Question 
1(a) 63 76 75 91 71 17 9 3 
(b) 57 69 68 78 64 18 12 6 
(e) 60 73 65 76 65 16 14 5 
(d) 55 65 60 66 59 21 13 7 
(e) 41 63 61 77 53 30 11 6 
2 46 55 49 66 50 29 19 2 
3(a) 29 47 18 43 29 61 6 3 
(b) 24 43 17 36 26 18 5 51 
4(a) 31 51 37 58 38 46 9 7 
(b) 26 27 42 36 32 49 12 8 
5(a) 33 27 29 40 33 54 6 7 
(b) 18 18 24 26 20 60 7 13 
| 6(a) 55 80 57 57 58 28 11 3 
(b) 7 6 6 3 6 63 8 22 
7 66 80 70 88 71 17 11 1 
8 76 90 84 96 82 8 9 2 
9 64 82 70 89 Os NaN 12 3 
10 63 71 64 80 66 461615 3 


_ The positive responses for the total group of 685 are presented at the 
right side of the Table. The results indicate that: 71 per cent got 4 
better idea of their strongest abilities; 38 per cent found they had under- 
estimated their aptitudes for particular jobs, 32 per cent in general;? 71 
Per cent got a better understanding of themselves; 65 per cent and 59 
per cent respectively got a better understanding of their personalities in 
relation to fields of work or of ways of promoting their personality 
development. Although 33 per cent reported ambitions not supported | 
by test results for particular jobs, and 20 per cent reported general am- \ 


* The amount of over-lap in these per cents is not available. 


Values of Vocational Counseling 465 


_ not ‘supported by test results, 58 per cent reported increased 
€ nfidence; only 6 per cent, decreased self-confidence.? 
Vith respect to the vocational questions, 53 per cent got a better 
vocational transfer possibilities; 50 per cent expect future voca- 
or training decisions to be influenced by their counseling; 29 per 
had already made such decisions, 26 per cent in accord with the re- 
f their counseling. With respect to the last two figures, it should 
oted that 21.6 per cent of the total group had left the company and 
had the occasion to make use of the counseling. The 51 per cent 
o did not reply to question 3(b) includes those who answered 3(a) in 
egative. Some individuals answered “No” to both 3(a) and 3(b). 
accounts for the apparent discrepancy between the “Yes” answers 
(a) and the “Yes” and “No” answers to 3(b). 
The lower per cents of positive replies to Questions 9 and 10 (70 and 
cent respectively) may suggest skepticism as to the sincerity of 
eplies to Question 8 (82 per cent). The ensuing discussion has a 
on this. 
he project was initiated by orientation discussions attended by all 
personnel. At this time, emphasis was placed upon the fact that 
unseling was a cooperative affair and that individuals would profit 
who had a genuine desire for better self-understanding. The sug- 
on was made that individuals should elect the counseling because 
Í such a desire and not because of curiosity nor because the counseling 
3 to be at the company’s expense. A considerable number who did not 
quest counseling at the beginning of the project later asked for it, after 
orts began circulating through the plant as to the results. The final 
mber counseled included a certain proportion of employees who had 
No serious need for counseling nor intention to put the results to practical 
When this is taken into consideration, the indication is that the 
nts for the positive replies are lower than they would be for indi- 
s requesting counseling because of a felt need. The per cents 
ering Questions 9 and 10 affirmatively may represent the proportion 


Differential Results 
ation of Table 1 reveals generally more favorable replies by 
employees who had left the plant than by those still employed; also 
‘inety per cent of the men and 96 per cent of the women who had left 
plant answered Question 8 in the affirmative; 71 per cent and 80 per 
22 per cent not replying to 6(b) include the 19 per cent who did answer 6(a). 


466 Rose G. Anderson 


cent respectively answered Question 10 in the affirmative. The majority 
of the employed women were war-time employees. The more favorable 
replies by those who had had the occasion to apply their counseling or who 
anticipated a change in employment are considered significant. 

The responses reflecting positive contributions were analyzed ac- 
cording to age, years of education, and percentile on the Adult Placement 
Test (a mental alertness test with verbal and numerical sections) for both 
men and women. 

The results for the men are reported in Table 2. 


Table 2 
Affirmative Responses of 366 Men to Counseling Questionnaire Classified by Age, 
Education and Tested Ability 
Age Years—Education Ability —Percentile 
a8- (26- a- (% @- (31- 
25) 40) (41+) 8& 11) (12) (13+) 30) 70) (70+) 
No. Cases 28 254 84 57 70 166 73 91 134 m 
% h % % bh % 5i w % % 
Question 
1(a) 68 67 50 538 73.. 64 58 60 61 66 
1(c) 54 65 48 53 69 59 60 52 59 67 
1(d) 46 59 45 46 59 58 53 g3 63 69 
1(e) 46 43 35 837 47 4 42 46 40 40 
2 50 50 33 33 56 45 49 40 46 50 
6(a) 567 55. 56 54 64 — 54 5l 47 59 0n 
7 64 67 «61 63 69 65 66 64 6 6 
8 7) 75 76 OPO 77) 79 69 7S 
10 54 6 63 6 67 5 7 66 60 63 


According to expectation, the oldest men derive less personal benefit 
than the younger men. In spite of this, a third of the group indicate 
their future decisions will be influenced by their counseling. They report 
increased self-confidence in the same degree as the younger men. Also, 
they support self-financed counseling as strongly as the younger group. 

Tn the analysis for years of education, the group with a grammar 
school education or less report less favorably than the other groups. 
However, 33 per cent report that future decisions will be influenced, 54 
per cent report increased self-confidence, 67 per cent feel it was a. worth- 
while experience and 65 per cent recommend self-financed counseling: 
The group with (9-11) years of education report the most positive 
benefits. Favorable test comparisons for many in this group with the 
high school graduates contributed to the more positive replies. The 
results for the group who are high school graduates and the grouP with 


Values of Vocational Counseling 467 


ollege are in general comparable. The latter group support self- 
financing more positively than any of the other groups. 

Tn the case of the ability groupings, the highest ability group report 
most favorably. However, the uniformity in the ability groups is more 
riking than are the differences. 
_ The counselors had anticipated less favorable replies for the men 
vith fewer years of education and lower test results. In many cases, the 
limited educational background and low test results on all tests used 
tated basing recommendations chiefly on work experience and 
Some counselees resented this, as indicated by the adverse 
mment by a “minishinest,” aged 47, with less than an 8th grade educa- 
n and a test percentile below 30: “The test only gave me the same thing 
the concolr talked of so the time it took to answer was wasted. I 
ink as they did not tell me anything I did not know. I expected to 
im a lot but was dissiponted.” 
Tn contrast, another counselee aged 42 years in the same educational 
test groups replied affirmatively to 1(a) (b) (c) (d), 5(a), 6(a), 7, 8, 
d10. He added the comment: “Let me add my thanks and apprecia- 
tion to the parties who conducted the tests and interviews. They were 
handled in a very friendly manner that made one feel they had your 
sonal interest at heart.” The tabulated results reflect more support 
the attitude reflected in the latter statement. 


‘to 


Table 3 
Group Comparisons with Total Plant Men 
isagree Total Plant Men 
P! pe ig y Lower Higher 
Group I 

No. = 23 1(e) a 1(a) 9 
Age 41 or older 6(a)* 1(c) 
Median age 50 8 1(d) 
Education 10 2 

8th or less 3 
Percentile 

30 or less 

Group IL 

No. = 32 1(a) 3(a) 1(e) 
Age 26-40 1a) 2 
Median age 30 : 1(e) 10 
Education 6(a) 

Some college 7 
Percentile 8 

70 or more 9 


468 Rose G. Anderson 


A comparison is presented in Table 3 of the replies for men over 41 
years in age with an 8th grade education or less, who fell below the 30 
percentile on the Adult Placement Test, with the men aged 26 to 40 
years with some college education, whose test percentiles were 70 or 
higher. Items on which each group agree or disagree with the total plant 
men are shown. 

The older, less able group report less favorably than the total plant 
men on questions related to self-appraisal and vocational decisions; 
equally favorably on increased self-confidence, counseling as a worth- 
while experience, and self-financing; and more favorably on counseling 
as a plant practice. 

The younger, more able, better educated group report more favorably 
than the total plant men on possible job transfers, influence of counseling 
on future vocational decisions, and self-financing of counseling. 


Table 4 


Affirmative Responses of 171 Women to Counseling Questionnaire Classified by 
Age, Education and Tested Ability 


Age Years—Education Ability—Percentile 
(18-  (26- (- (%- a- (31- 
25) 40) (41+) 8) 11) (12) (18+) 30) 70) (70+) 
No. Cases 81 55 35 19,8) "Sipr 80) 38 33 «73 «= «65 
Guate % h % h T or % % os % 
l(a) 80 65 980 4 771 #78 81 or T 
1c) 64.71 57 62 48 65 78 58 68 65 
1(d) 64 62 46 46 55 62 63 52 66 57 
1(e) 7 56 49 46 52 65 66 64 58 65 
2 54 42 46 88 42 48 58 45, 45 54 
6(a) 60 49 60 54 58 53 66 55 52 63 
7 “m 8&8 77 69 61 71 74 7% 467 ~ «89 
8 86 84 80 69 84 82 95 70 85 91 
10 67 65 57 46 55 66 74 48 66 71 


In Table 4, the affirmative responses for the women are presented. 
Positive values are reported for all three age and educational levels, with 
i older, less well educated women reporting less favorably on ways of 
Promoting personality development and job transfers. 

ple estive values are reported by all ability levels with the least favor 
able replies for those in the lowest ability level for questions bearing 0 
personality in relation to fields of work, promotion of personality develop- 
ment and self-financing of counseling. Both the middle and low ability 
groups report less favorably on effect of counseling on future vocation 


Values of Vocational Counseling 469 


In general, the most favorable responses are for the highest 


previously indicated, this survey was conducted for the enlighten- 
f the counseling staff with respect to positive values reported and 
i respect to possible improvements in the procedures. 
higher proportion of returns is considered desirable for representa- 
conclusions. Earlier in the project, however, the president of the 
nt personally conducted his own survey of employees who had been 
seled. At that time, he received 80 per cent of favorable replies. 
ounseling unit felt that the group which had been counseled to that 
nt was rather heavily weighted with older or less able men. This 
fact and the agreement of the earlier survey with the questionnaire 
ms provide some basis for regarding the latter as representative. 


Attitudes Reflected in Counselees’ Comments 


i 
Popular, much publicized articles have over-emphasized “the hidden 
ents” and “the many aptitudes” revealed by aptitude testing. Conse- 
ntly, many individuals whose results do not reveal startling “new di- 
ions” or a “highway to success” experience a natural disappointment. 
Qualified vocational counselors are modest in their claims as to the 
its of counseling. They are aware of the limitations in their tech- 
, of the many economic and social factors preventing full use of 
mmendations, and of the emotional resistances in counselees against 
ting and acting upon interpretations which conflict with their own 
valuations or their needs for emotional security and prestige. 

A considerable proportion of respondents added comments to supple- 
nt their replies. Samples which reflect various attitudes of both a 
tical and a favorable nature merit report and discussion.‘ st 

_ Skepticism is expressed by a young college man in the high ability 
as to whether the counselee should be given his results: “It is O.K. 
‘the employer, but how much the individual should or need be told 
cerning his merits is debatable. Many middle group individuals 
etter performers than the so-called top-flight brains, because of the 
rences in character which tip the scales back to offset the original 
itage. Why discourage the plugger by telling him he can’t possibly 
his goal? Many will if they don’t know that they haven’t got the 
ity.” The evaluation of those compensatory assets which offset 


470 Rose G. Anderson 


less favorable test results constitutes the chief difference between com- 
prehensive vocational counseling and aptitude testing. The respondent 
is not speaking from first-hand experience when he implies that the 
counselor tells the individual “he can’t possibly reach his goal.” The 
emphasis in each case is on the most positive potential possibilities of the 
individual. In arriving at these, some goals are indicated as less promis- 
ing in returns for effort expended. 

Another common misunderstanding is reflected in the comment of a 
fifty year old man who was a war-time employee. He had formerly been 
a photo-engraver. His results provided most positive support for re- 
turning to his former occupation. He stated, “I don’t want to because 
I knew for years that I wasn’t a whiz at that job and never would be. . . . 
I'd say that for me it (the counseling) was a bit wasteful of time although 
I found it interesting.” A point requiring constant re-interpretation is 
the fact that recommendations for certain kinds of work do not carry an 
inherent guarantee of successful competition with others so engaged. 
On the contrary they indicate that chances for productivity are relatively 
better in the suggested fields than in other fields considered. From this 
man’s comment, it is apparent that this was not sufficiently clarified in 
his interviews with the counselor. 

Another comments, “I enjoyed taking the tests and talks with the 
counselor. . .. What I objected to most was methods of comparison. 
Instead of using John Doe, same age, same type community and same 
general type, we were given dozens of comparative types.” This point 
is well taken. However, comparisons must necessarily be made in terms 
of standardization groups. This criticism overlooks the fact that plant 
norms were established for three of the basic tests given to the entire 
plant personnel. Norms were established for 16 occupational groups, 10 
educational levels, and for the total plant group for the Adult Placement 
Test and the Psychological Corporation General Clerical Test. Norms 
for four Occupational and two age groups for men, and for two occu- 
pational groups and two age groups for the women were established for 
the Revised Minnesota Paper Form Board Test. Comparisons were 
reported with the appropriate plant groups as well as with the stand- 
ardization groups. The “dozens of comparative types” mentioned sug- 
gests the possibility that this counselee was given more comparisons 
than he needed or could assimilate. 

a panic man reports, “I sincerely believe that I wasted my time and 
vier te AG taking the tests. _I was not in a proper frame of mind 
ook them. . . . I would sincerely like to go through it again with 
Boras ~: There was evidence that the personal 
ot the counselor did influence the attitudes toward the counseling 


Values of Vocational Counseling 471 


sults. An analysis of the questionnaires according to who had done the 
seling reflected more favorable replies for certain of the counselors. 
many instances, transfers were made within the plant on the basis 
the counseling results. Apropos of such changes is the following 
ment: “Since the counseling I have been shifted by the management 
a job that is much more in line with my aspirations. I am very 
satisfied with the change.” 

_ Excerpts from other favorable comments include: “I think it was a 
it opportunity and intend to follow it as closely as possible.” “I 
A it is a good thing because it is so easy to be mistaken by wishful 
hinking.”” “. .. It helped me a great deal in planning my future.” 
“Tused my counselor’s summary to good advantage in obtaining the 
above position.” “. . . Gave me a better idea of opportunities for em- 
! ployment of one of my age.” “. . . Gave me confidence in myself to 
do the kind of work I’ve always wanted to do.” “. .. Gave me good 
ht into the future in the respect of family affairs.” “. . . Received 
eral helpful suggestions that have proved beneficial in my every- 


‘ , Placement Follow-up of Veterans 


_ The second project referred to above afforded the opportunity to 
ck placement outcomes against counseling recommendations. The 
‘ore subjective evaluations of the benefits of counseling could be com- 
pared with practical results. 
_ A Veterans’ Information and Placement Service was set up and fi- 
nanced by Rockford business men and industrialists just subsequent to 
the completion of the plant project. The vocational counseling for this 
Project was handled by the same counseling unit which functioned in the 
_Woodward-Governor Company.' A full time placement officer worked 
dependently but cooperatively with the counseling unit in finding em- 
ployment for the veterans. K 
_ The veterans comprise the age-range least represented in the plant 
Studies, As a group, they represent a higher educational level than the 
lant men. Fifteen per cent had less than a high school education, 
Compared to 34.7 per cent for the plant; 61 per cent were high school 
graduates, compared to 45.3 per cent for the plant; and 24 per cent had 
l some college training or were college graduates, compared to 20 
per cent for the plant. Twenty-two per cent were commissioned officers, 
78 per cent were non-commissioned officers, technicians, or privates in 
the Services, 
__' Miss Olive Bray left the staff at this time to set up a counseling unit at Rockford 
ollege for Women. 


472 Rose G. Anderson 


When he applied for the counseling service, the veteran agreed that 
the placement officer was to be given a copy of the vocational recom- 
mendations. In order to make the latter as meaningful as possible, a 
code was developed which indicated whether the veteran had the sup- 
porting qualifications for a suggested occupation in high degree, in 
moderate degree, or in questionable degree; also whether he needed 
further education, on-the-job training, or work experience. 

Since a number of alternatives was listed for each veteran, the code 
enabled the placement officer to guide the men into those activities for 
which they were best qualified. Periodic check-ups were made with 
both the employee and the employer after placement. The records of 
placement and follow-up were kept by the placement officer. The 
analysis of the results was made by him and made available to the writer. 

The results reported were compiled at the time that 516 veterans had 
completed their counseling. Of this number, 444 were available for 
follow-up in Rockford. Of these, 82.4 per cent were satisfactorily placed 
in recommended jobs according to their own and their employers’ state- 
ments; 10.9 per cent had been placed in other jobs. At the time of the 
last follow-up, 7 per cent were not yet employed. The last group in- 
cluded a number of men in upper economic brackets who had not sought 
employment. 

The employment stability record was one of the most outstanding 
results of the veterans’ placement service. Of those employed, only 
eleven men, less than 3 per cent, had changed jobs in a period of 19 
months. 

Labor turn-over was high in the early post-war period. For the mid- 
months of the veterans’ counseling project (January and February, 1946), 
the monthly turn-over for eleven major U. S. industries (5) ranged from a 
low of 5.3 per cent toa high of 13.1 per cent; for manufacturing it was 6.8 
per cent. Since Rockford is a manufacturing center, the comparison 
between the latter figure and the record of the veterans’ placement service 
provides some basis for judging the value of counseling prior to placement. 


Summary 


Analyses of replies to a questionnaire designed to gauge counselees’ 
estimates of the benefits of counseling have ben ai. The results 
were analyzed separately for employees remaining in their jobs and those 
who had left for other work or training. Results were also analyzed 
according to age, education, ability level, and sex of counselees. 

‘ The trend of the questionnaire returns provides strong confirmation 
iat the counseling had significant positive values for the participants. 
results also confirm the philosophy underlying the framing of the 


Values of Vocational Counseling 473 


e. That is, that the values vary from individual to indi- 
The fact that the highest per cent (82) of positive replies is to 
n 8, “Do you feel that your counseling was a worthwhile ex- 
’, may be regarded as evidence that the values are distributed 
ig the areas covered by the other questions. More favorable reports 
made by those who had had the occasion to use the results of their 
ng, and by those who anticipated post-war job transfers. y 
lunteered comments have been reported which throw light upon 
the common misconceptions of counseling, on necessary cautions 
served by counselors, and on the varied positive contributions to 
viduals counseled. 
supplementary report is included of placement follow-ups for vet- 
who had been placed in accordance with the results of their coun- 
Of 444 veterans available for follow-up, 82.4 per cent were satis- 
y placed according to their own and their employers’ statements. 
a period of 19 months, less than 3 per cent had changed jobs. 
This report has not included the replies to the final question on the 
jonnaire with respect to recommended age for counseling. These 
sults have been analyzed in a separate report (4). 


January 17, 1949. 
References 


, L., and Hill, J. A follow-up study of veterans receiving vocational advise- 

ment. J. consult, Psychol., 1947, 11, 88-92. 

wn, M. T. The veterans report one year later. Occupations, 1947, 25, 209-212. 

e E. P. Summary report of veterans in training. Occupations, 1947, 25, 

- 840-342. 

vrs Rose G. Preferred ages for vocational counseling. Occupations, 1948, 
, 77-81. 

thly Labor Review, U. S. Bureau of Labor Statistics, May, 1946, 


Vocational Interests of Accountants 


Edward K. Strong, Jr. 
Stanford University 


In scoring Strong’s Vocational Interest Blank, three scales have been 
employed to measure the interests of men employed in office-accounting 
work (5). These are: 


1. Office worker, representative of office activities in business concerns 
including bookkeepers, purchasing agents, credit managers, and office 
managers. 

2. Accountant, based upon the records of 160 general accountants, 
54 cost accountants, 65 auditors, and 66 comptrollers and treasurers. 

8. Certified Public Accountant, so certified in the states of New York 
and California. 


Accountants and CPA’s were further differentiated in developing the 
scales so that the former represented men regularly employed by business 
firms whereas the latter were employed by public accounting concerns. 
Among the former were some men holding the CPA certificate. But 
they were classified as accountants, not CPA’s on the basis of the em- 
ployer for whom they worked rather than on the basis of whether or not 
they had the CPA certificate. 

A fourth scale is related to the above, i.e., Purchasing Agent, composed 
entirely of purchasing agents, whereas the Office Worker Scale is primarily 
representative of bookkeepers and related office activities with a smaller 
representation of men holding more advanced positions in office work. 

Since these scales were developed there have been many queries re- 
garding the relationship between the Accountant and CPA Scales, be- 
cause each group scored about 40 on the other scale but the correlation 
between them was only .28, 

The primary question concerning us here is: do the Accountant and 
CPA Scales measure what they purport to measure? 

Tn 1943 a survey was made of members of the American Institute 
of Accountants by Dr. Ben Wood and Dr. A. E. Traxler, under the 
direction of the Committee on Selection of that Institute. The survey 
included tests of: (a) ability or achievement, with which we are not here 
concerned; and (b) the Vocational Interest Test. Data on the latter 
were obtained from 1856 accountants (1). Additional records of 1117 


474 


Interests of Accountants 475 


accountants in Canada were secured under the auspices of the Dominion 
Association of Certified Accountants (2). 

Occupational interest scores were determined for 1000 of the American 
accountants, 200 from each of five sub-groups, namely, partners, man- 
agers, seniors, semi-seniors, and juniors. The median score on each of 
twenty-three scales agreed very closely with the medians based on the 
1117 Canadians. 

On the basis of these data the present Accountant Scale is judged 
adequate for juniors since the distribution of letter ratings of 314 juniors 
agreed very closely with the distribution of ratings of the criterion group. 
See Table 1. 

Table 1 
Percentage Distribution of Letter Ratings on Accountant Scale of 314 Junior 
Accountants and Members of Criterion Group upon 
which the Scale is Based (2) 


Letter Criterion 
Rating Juniors Group 
% % 
A 65.0 69.2 
B+ 16.2 15.0 
B 12.1 9.2 
B- 3.2 44 
C+ 2.9 1.6 
c 0.6 0.6 


At first thought it is surprising that the Accountant Scale should 
represent the interests of junior accountants. The Accountant Scale, as 
pointed out above, represents not merely men of junior level but also to 
some extent men of higher levels up to the top ranks of comptroller and 
treasurer. The explanation may lie in the likelihood that the group of 
juniors contains within it men who are destined, later on, to reach inter- 
Mediate and top levels of accounting work in a business concern. If 
this is the situation it is support for the writer’s procedure of devel- 
oping scales for an occupation rather than a position or level within an 
Occupation. 

The present CPA Scale, on the other hand, does not properly represent 
the interests of the great majority of public accountants. See Table 2. 
None of the sub-groups, except partners, scores high enough for the Scale 
to be considered as representative of their interests. The distribution 
of scores of partners is such that the Scale can be used to represent their 
interests fairly well, although it is to be expected that partners will secure 
Somewhat fewer A ratings and more B and B— ratings than should be 

expected from a scale that adequately represents them. 


476 Edward K. Strong, Jr. 


This raises the question, when may two groups be considered to be 
one group and when two separate groups? What objective criteria may 
be set up to answer this question? Use of critical ratios, or their equiva- 
lent in terms of level of significance, is not applicable. Two means may 
be significantly different and at the same time common sense indicates 
that the two groups differ too little to be divided into two separate 
groups. Percentage of overlapping appeals to the writer as probably 
the best criterion to use in this connection.! There is, however, no 


Table 2 


Percentage Distribution of Letter Ratings on Original CPA Scale of Five Levels of 
Accountants and of Members’of the Criterion Group (2) 


Letter Criterion 283 226 582 361 311 1766 
Rating Group Partners Mgrs. Snrs. Semi-Sen. Juniors Total 
% % % % % % % 
A 72.6 60.8 44.3 36.8 38.8 31.2 41.0 
B+ 13.6 14.8 19.0 20.3 16.6 18.5 18.2 
Bi 6.2 12.0 20.4 19.1 18.3 16.9 17.5 
B- 3.7 8.5 9.7 12.7 12.7 17.2 12.5 
C+ 2.5 2.8 44 6.7 7.2 11.1 6.7 
Cc 14 11 2.2 44 6.4 5.1 4.1 


accepted agreement as to what percentage of total overlapping should 
be used as the cutting point. Using the data in Table 2, we have 87.9 
per cent overlapping between scores of the criterion group and partners 
but only 71.7 per cent between the former and managers. Our present 
judgment is that 88 per cent is too high an overlapping to consider the 
two groups separate groups and that 72 per cent, on the other hand, is too 
low to include the two in one group. In making this statement we have 
in mind the overlapping of a considerable number of pairs of groups. 
One difficulty in arriving at the proper cutting point lies in the fact that 
Popular opinion respecting whether two groups are similar or not and 
Percentage of overlapping do not correlate perfectly. 


Senior CPA Scale 


E new scale has been developed based on the interests of 611 Senior 
CPA's. This Scale will be designated as “Senior CPA” in contradistinc- 
tion to the old CPA Scale, to be referred to from now on as the Partner 


CPA Scale. The new Scale has reliabili ich i h higher 
EE a reliability of .89, which is much hig 


1 Percentage of overlapping is the my E 
matched with scores in the other aae of scores of one group which may ; 


Interests of Accountants 477 


Table 3 


ae Scores of Office Men and Three Groups of Accountants on Their 
Four Scales and Also on the OL Scale 


Occupational Groups 
pie Man Accountant Senior CPA CPA Partner 


o M o M o M o 


49.4 104 46.9 98 43.9 9.5 85.8 11.0 
43.6 114 50.3 9.6 47.7 84 411 117 
ay SS HE ie 49.7 10.1 -= T. 
31.6 98 839.3 11.0 42.4 94 50.1 10.4 


57.1 7.6 59.6 80 574 63 62.7 7.0 


) of the four scales of Office Worker, Accountant, 
nior CPA and Partner CPA are shown in Table 3. Scores on the Office 
o down from office worker to CPA Partners and, in reverse, scores 


nior CPA’s who hold an intermediate position between juniors and 
rtners have interests more akin to juniors than to partners as far as 
on the four scales go. This is also true with respect to scores on 


Table 4 


aaa between Interests of Senior CPA’s and Other Occupations and 
Mean Scores of Senior CPA’s on the Various Scales 


Scale Mean r Scale 
Mathematics-Science Teacher 38.6 10 Banker 
_ Accountant 47.7 .07 Partner CPA 
_ Policeman 30.2 .06 Mathematician 
58 Office Worker 43.9 03 City Sch. Supt. 
53 Production Manager 38.6 02 Psychologist 
| Printer 30.8  —.01 Musician 


29.3 —.07 Minister 
24.6 -—.08 Dentist 
20.6  —.21 Architect 


478 Edward K. Strong, Jr. 


The correlation of interests of Senior CPA’s and men in 36 occupa- 
tions are given in Table 4. The correlations with accountant and office 
worker are .72 and .58, but only .07 with partner CPA’s. The latter 
low correlation is similar to .28 between accountant and partner CPA 
and .06 between office worker and partner CPA. The interests of CPA 
partners are clearly quite distinct from the interests of office workers, 
junior and senior accountants. This relationship is shown clearly in 
terms of OL scores. Partner CPA correlates .43 with OL whereas the 
other three scales correlate negatively with OL, i.e., —.26 with accoun- 
tant, —.33 with office man and —.64 with senior CPA’s. The difference 
here is very large between partner CPA and the other three scales. 

In line with the low negative correlation with OL the interests of 
senior CPA’s are correlated positively with occupations in Group IV, 
i.e., mathematics-science teacher, policeman, printer, aviator, forest 
service, carpenter and somewhat lower with farmer. Actually the 
Senior CPA Scale correlates on the average of .54 with the seven occupa- 
tions in Group IV, whereas the Accountant Scale correlates only .15, 
the Office Scale .09, and the Partner CPA Seale correlates —.44. This 
is a most unexpected relationship. The interests of Group IV typify 
mechanical interests. Why should senior accountants exhibit such in- 
terests and also why should they exhibit more mechanical interest than 
partners, on the one hand, and juniors on the other hand? 

To what occupational group should the new scale be assigned? Up 
to the present time Group VIII has contained purchasing agent, office 
worker, accountant and banker. But senior CPA cannot be grouped 
with these four since it correlates only .33 with purchasing agent and .10 
with banker. On the basis of correlations of .60 and over senior CPA’s 
should be grouped with mathematics-science teacher (.72), accountant 
(.72), policeman (.64), and, if we stretch a point, office worker (.58). 
If high mean scores are considered, senior CPA should be grouped with 
accountant (score of 47.7 on that scale), office worker (43.9), partner 
CPA (42.4); and in the neighborhood of public administrator (39.9), 
purchasing agent (38.7), and production manager (38.6). 

Ordinarily, as the correlation between two interests goes down, the 
mean scores of each on the other’s scale also godown. But here we have 
an exception to this relationship. For example, the interests of senior 
CPA’s correlate .72 with mathematics-science teacher but they average 
only 33.7 on that scale. Similarly the correlation with policeman is .64 
but the mean score on that scale is 30.2. In grouping occupations it 
Bears desirable to take both correlation and mean score into account. 
ace writer, A has found no statistical way of combining the two 

easures. About all that can be done is to group occupations on the 


Interests of Accountants 479 


of both measures on a common sense basis. Upon this basis we 
id disregard the high correlations between senior CPA and both 
thematics-science teacher and policeman because of the low mean 
res. Another reason for doing this is that occupations having no 
vious connection in everyday life should not be included in the same 
group. To ignore this criterion might result in occupational groups 
_ which are useless as far as guidance and selection are concerned (5). 

For the time being, we suggest a division of the present Group VIII 
into Group VIIIa composed of purchasing agent and banker and Group 
VIIb composed of accountant, senior CPA and office worker. The two 
_ sub-groups differ too much to be considered as one occupational group. 
F In terms of the Interest Global Chart, senior CPA should be located 
_ below accountant in the direction of Group IV but its exact location can 


CPA Partner Seale 


oe We hope to revise the original CPA Scale before long, from now on 
_ to be called the CPA Partner Scale. Until that is done we believe the 
"present scale is of distinct value. It reflects interests of a managerial 
_ type not particularly reflected in either the new Senior CPA Scale or the 
_ old Accountant and Office Worker Scales. The data in Table 2 make 
clear that scores on this scale decrease as one goes from partner to 
Manager to rank-and-file accountant. 
The fact that the CPA Partner Scale correlates highest of all with 
the Lawyer Scale (i.e., .57) but only .28 with accountant, .07 with Senior 
OPA and .06 with office worker, indicates to the writer that the CPA 
_ Partner and Lawyer Scales express the interests involved in dealing with 
pe 4 client regarding accounting or legal matters, an activity not common 
to the non-managerial employee. S 
- Again and again in studying the interests of members of an organi- 
= zation the fact is brought out that there are real differences in the in- 
_ terests of the rank-and-file of employees and the interests of the admin- 
_istrators or executives in the top levels of management (4, 6, 7). We 
believe such differences are reflected in the noticeable differences in 
Scores between the Senior CPA and the CPA Partner scales. 
Existing data indicate that only a small minority of the rank-and-file 

; of employees possess the interests that characterize the top managerial 

_ group. The evidence so far supports the assumption that it is from this 
Minority that the future managers will come. If this is true it is most 
_ important to identify the minority early in their employment and to 
_ afford them special opportunities to prepare for advancement. The 
= Writer does not feel that the above point of view is more than a good 


480 Edward K. Strong, Jr. 


working hypothesis today. It is possible that interests may change 
with promotion and increasing responsibilities. If this proves to be 
correct it means that an organization should give attention to the early 
training of its superior subordinates not only in procedures but also as 
regards interests. How the latter is to be done, if it can be done, is 
certainly not definitely formulated today. 


Occupational Interest Scores ‘of Accountants 


Table 5 gives the occupational interest scores of six groups of accoun- 
tants on the 13 scales on which 1000 public accountants score the highest. 
The 1000 accountants are composed of 200 each of partners, managers, 
seniors, semi-seniors and juniors. The scores of students in accounting 


Table 5 
Occupational Interest Scores of Six Groups of Accountants 


_ Note: Data in first three columns are medians, after Wood and Traxler; remaining 
data are mean scores. 


1000 100 
Public 100 Ist 100 100 100 
Account- Acct. Year Senior Account- Office 
ants Seniors Students CPA’s ants Men 
Accountant 47.5 44.6 40.3 47.7 50.3 43.6 
CPA Manager 42.5 36.8 32.8 42.4 39.3 31.6 
Office Man — — _ 43.9 46.9 49.4 
Production Manager 39.8 38.2 37.0 38.6 39.2 36.2 
Purchasing Agent 39.6 38.5 39.6 38.7 41.9 42.2 
Banker 36,7 37.5 35.3 35.7 39.0 37.8 
Personnel Manager 35.2 40.7 40.2 35.3 34.6 29.6 
President 35.2 34.5 35.8 32.6 34.4 34.4 
Realtor 34.7 ea we 34.1 35.6 30.2 
Sales Manager 33.9 38.4 40.3 32.4 35.5 36.9 
Math. Science Teacher 334 332 312 337 301 27.9 
Engineer 323 232 226 31.8 28.1 25.7 
paver 314° 30.8 = 80.8 30.1 29.3 28.2 
= — — 63.1 59.6 57.1 


approximate the scores of the thousand accountants but differ by having 

Scores in personnel and sales management and lower scores on 
accountant, engineer, and especially CPA Partner. The differences are 
Presumably due to the expectation that some of the students will not g0 
into accounting work and that these students have less interest in 
accounting than the remainder, 


The office men and accountants, representative of juniors, differ from 


Interests of Accountants 481 


in higher scores on office interest and lower scores on CPA Partner 
t and OL. 

Unfortunately Wood and Traxler did not score their blanks on the 
fice Men and OL Seales. These scales throw as much light on the 
onship between the levels of accountants as any. The writer has 
nd that the office man scale is the best single scale for indicating 
ness interest. All business men from president to office man spend 
od share of their time looking at and shuffling papers. When a 
lee scores low on this scale one should have him consider other 
ations than typical business activities. 

September 6, 1949. 

publication. 

References 

of Committee on Selection of Personnel. American Institute of Accountants, 
Jan. 15, 1945. 

mmittee on Selection of Personnel. A study of the ability of accounting students. 

_ American Institute of Accounting, 1946, Bull. No. 1. 

college and professional accounting testing programs. American Institute of 
Accounting, 1947, Bull. No. 3. 


1949, M.A. Thesis, Stanford University. ; 
Vocational interests of men and women. Stanford University 


1945, 5, 151-171. oi 
ng, E. K., Jr. Interests of senior and junior public administrators.: J. appl. 
Psychol., 1946, 30, 55-71. 
Tilton, J. W., Measurement of overlapping. J. educ. Psychol, 1937, 28, 250-62. 
vad 

h $ + 


Vocational Interests of Psychologists * 


Philip H. Kriedt 
Prudential Insurance Company, Newark, N. J. 


This study was undertaken with the general objective of making the 
Strong Vocational Interest Blank a more useful tool for the guidance of 
both beginning and advanced psychology students. More specifically, 
the study proposed: 1. to determine the adequacy of the 1938 S. V. I. B. 
(Strong Vocational Interest Blank) psychologist key and to construct a 
new key if it seemed necessary; 2. to develop interest profiles for various 
sub-groups of psychologists based on the scores of these sub-groups on all 
the 1938 S. V. I. B. keys; and 3. to construct new keys for several sub- 
groups of psychologists so that better differentiation between sub-groups 
could be secured. 

The Pilot Study 


As a pilot study, 95 prominent psychologists who could be classified 
quite clearly as experimental, social, guidance, statistical, or industrial 
psychologists were asked to fill out the S. V. I. B. By using three follow- 
up letters, returns were received from 92 of the 95. Analysis of these 
data indicated that the 1938 psychologist key was not satisfactory for 
this group. Only 56% of them received scores of A or B+ instead of the 
847% that would be secured if the key were completely appropriate. This 
difference is statistically significant (p. <.01). Analysis of the interest 
profiles of the five sub-groups showed sufficient differences between the 
sub-groups to warrant the collection of more data of this sort in order 
to construct sub-group keys. 


The Main Study 


For the main study, begun in April, 1948, all male psychologists who 
had received a Ph.D. before 1943 and whose addresses were listed in the 
1948 APA Directory were asked to fill out a S. V. I. B. Three follow-up 
letters were needed in order to obtain 1048 (89%) usable returns. Anal- 
ysis of the age and major field of experience of those who had not replied 
indicated that the sample obtained is not a biased one in those respects- 

* This article is based Š : ‘obs ¥ 
Tints Gee eye a ones te Oe So c D 
chologists,” and was completed in June, 1949, at the University of Minnesota. 


482 


Vocational Interest of Psychologists 483 


Each psychologist furnished information regarding his professional ex- 
perience which made it possible to classify him according to “field” and 


in most cases according to “function” also. The numbers in each field 
classification are: 256 experimental, 221 clinical, 154 educational, 115 
guidance, 108 industrial, 69 social, 65 statistical, 44 child, and 16 market- 
ing. The numbers in each functional classification are: 295 teaching, 
184 research, 146 service, and 128 administration. (There were 315 who 
were not given a functional classification.) 


Profile Analysis 


Median scores on the 42 present keys of the S. V. I. B. for these 13 
sub-groups are presented in Tables 1 and 2.1 Psychologists-in-general 
have median scores of A on the psychologist and public administrator 
keys; median scores of B+ on the chemist and personnel director keys; 
and median scores of B on the artist, architect, physician, mathematician, 
engineer, math and physical science teacher, city school superintendent, 
advertising man, lawyer, and author-journalist keys. 

The rank order correlations between the median profile for all psy- 
chologists vs. each of the 13 sub-group profiles are as follows: clinical 
-98, social .96, child .94, educational .93, statistical .90, experimental .86, 
guidance .77, marketing .47, teaching .96, service .90, research .86, and 
administration .81. Marketing psychologists are the most deviant group 
and their profile shows that they are characterized by stronger sales and 
Office detail interests than other psychologists. Guidance psychologists 
deviate in the social service direction. Research, experimental, and sta- 
tistical psychologists are distinguished because of their physical science- 
biological science interests. Administrators have higher social service 
and production manager scores than most psychologists. The industrial 
sub-group differs from psychologists-in-general largely because of stronger 
Production manager and office detail interests. The service sub-group 
have verbal and social service interests which distinguish them. The 
other sub-groups (teaching, child, educational, social, and clinical) have 
Median profiles which are very similar to the total group profile. 


The 1948 Psychologist Key 


A new psychologist key was constructed by contrasting the responses 
of these 1048 psychologists with those of Strong’s 1938 professional men- 


1 To reduce printing costs, Tables 1, 2, and 3 have been deposited with the American 
Documentation Institute, Order Document 2693 from American Documentation 
Institute, 1719 N Street, N.W., Washington 6, D. C., remitting $0.50 for microfilm 
(images 1 inch high on standard 35 mm. motion picture film) or $1.00 for photocopies 
(6 X 8 inches) readable without optical aid. 


484 Philip H. Kriedt 


in-general group. Strong’s method of assigning item weights was 
followed. This new key differentiates psychologists from professional 
men-in-general rather sharply. All of the psychologists exceed the mean 
score for professional men-in-general. The mean standard score for 
professional men-in-general is 17.5 as compared to 50 for psychologists. 
In other words, using the standard deviation of the psychologist group 
as the unit of measurement, the two means are 3.25 standard devia- 
tions apart. 

All 1048 psychologists were scored on this new key. Except for 
the marketing psychologists who have a mean score of B+, all the 
sub-groups have a mean score of A. Social, statistical, child, clinical, 
experimental, and research psychologists have high A mean scores; 
service and teaching psychologists have average A mean scores; guid- 
ance, educational, industrial, and administrative psychologists have low 
A mean scores. 

All 13 sub-groups score higher on the 1948 psychologist key than they 
do on the 1938 psychologist key. Industrial, guidance, and administra- 
tive psychologists show the greatest increase in scores and experimental, 
teaching, and research psychologists show the least increase. This 
means that if an individual has the interests of one of the last three types 
of psychologists, the 1938 key will differentiate his interests from those of 
professional men-in-general quite accurately. If, however, he has the 
interests of an industrial, guidance, or administrative psychologist, the 
1948 key is much more likely to give A and B+ ratings and presumably 
is a better reflection of the interest patterns of present-day psychologists 
than the 1938 key. 

Examination of the item weights for the 1948 psychologist key, pre- 
sented in Table 3, shows the ways in which the 1048 psychologists in- 
cluded in this study differ from Strong’s 1938 criterion group.? The 
1948 group is more socialized than Strong’s group, more interested in, 
more tolerant of, and more willing to help people, and less interested in 
mechanical and methodical work and in solitary activity. The 1948 
group seems to have more of the interests of an applied psychologist and 
Strong’s group more of the interests of a laboratory scientist. The 
Tesponse weights for the two keys are different in 484 instances. Most 
of the differences are changes of only one point, but 41 shifts involve 
differences in weights of two or more points. 

, The practical significance of the difference between the two keys is 

ae most clearly by the Scatterplot of individual scores on the tw 

7 yS, presented in Table 4. The data show that these psychologists tend 

© score higher on the 1948 key than on the 1938 key. For instance, 217 
2 See footnote 1, 


Vocational Interest of Psychologists 485 


Table 4 


Grades of 1048 Psychologists on the 1938 and 1948 Strong 
Vocational Interest Blank Psychologist Keys 


1948 Psychologist Key 


A 3) 8 | 21 407 - 524.. 50* 
B+ 2 81 8 114 158  16* 
Lea pe ale a oo [ee ee seen 
B 2 8 #2 | 45 76 (148 14 
ologist Key B— 1 1 E e a 82 19 87 8 
\ C+ 2 2 138 121 22 3 54 5 
c 8 10 Bay dye As 77 7 
11 15 53 86 |161 722 1048 
116) 2: Oe ede A 100% 


e difference between the 66% who score A or B+ on the 1938 key and the 84% 
A or B+ on the 1948 key is statistically significant (p. < .01), 


score B or lower on the 1938 psychologist key and B+ or 
1948 key while only 16 score B or lower on the 1948 key and 
on the 1938 key. These results seem to indicate that the two 
not be considered equally valid, and since the 1948 key is based 
and more up-to-date sample it is recommended that the 1948 
e substituted for the 1938 key. 


New Sub-Keys 


keys were constructed for four of the largest field sub-groups: ex- 
, clinical, guidance, and industrial psychologists. Since these 
intended for the guidance of advanced psychology students 
tht be undecided as to the field in which they should specialize, it 
that keys which contrasted the interests of each sub-group with 
; of other psychologists would be most useful. Consequently 
of Strong’s professional men-in-general group, the 1948 sample 
chologists-in-general was used as a reference point. Actually the 
e point was a slightly shifting one as the criterion group in each 
was subtracted from the total group. Strong’s method of 
g responses was again followed. 
four criterion groups vary in size from 108 to 256. Although 
ps are not as large as Strong recommends, they are large enough 
reasonably accurate results. Strong has found that keys 
n 250 cases are likely to differ from keys based on 500 cases by 


486 Philip H. Kriedt 


only one or two standard scores while keys based on 100 cases are likely 
to differ from those based on 500 cases by between two and eight standard 
scores,? 

The extent to which the sub-keys differentiate the four criterion 
groups from psychologists-in-general is presented in Table 5. Before 
analyzing these results, however, it may be well to consider some of the 
factors which have affected the degree of differentiation secured. 

In the first place, since all available members of each criterion group 
were needed in constructing the key for that group, it was not possible 
to rescore an independent sample. The separation we have obtained is 
consequently greater than would be expected for other samples. Sec- 
ondly, the shifting reference point which we chose to use in this study also 
tended to increase differentiation between criterion and reference groups. 


Table 5 


* Power of Four Strong Vocational Interest Blank Sub-keys for Psychologists to 
Differentiate Criterion and Reference Groups 


Difference Percentage 


Criterion Group Reference Group Between of Criterion 
————___ Meansin Group 
Mean Standard Mean Criterion Exceeding 
Raw Devia- Raw 8.D. Reference 
Sub-key N Score tion N Score Units Group Mean 
cee o a ba ro 
Clinical 221 47.5 209 827 23.0 1.2 85.1 
Experimental 256 42.9 38.8 792 —10.6 1.4 88.7 
Guidance 115 500 37.6 933., — 1 1.3 91.3 
Industrial 108 85.7 29.9 90 — 39 13 88.0 


On the other hand, the fact that all psychologists were forced into one 
of the nine field classifications has tended. to give us less separation than 
if ed had used only those psychologists who clearly fell into one of the 
classifications. Finally, it should be pointed out that, other things being 
equal, the smaller the criterion group, the greater will be its separation 
from other psychologists. This is true because sampling errors can be 
capitalized upon to a greater degree with a small group than with & 
large group. 

The results reported in Table 5 indicate that these four sub-keys 
have about equal differentiating power. In all four instances the ref- 
erence and criterion group means are about 13 standard scores, or 1.3 
standard deviations apart, and over 85% of the members of the criterion 


3 Strong, ENT) 
1943, pp. pri r sts of men and women. Stanford University Press, 


Vocational Interest of Psychologists 487 


eed the means of the reference point groups. These four keys 
ure the extremely sharp separation between criterion and 
e point groups obtained by the 1948 psychologist key, but it 
m that they have sufficient differentiating power to warrant 
Moreover, the four sub-keys have the advantage of being 
tirely on recent data while the 1948 psychologist key, since it 
the interests of present day psychologists with the interests of 
professional men-in-general group, may be somewhat out of date. 
An analysis of the response weights assigned to items for the four 
c presented in Table 3,‘ indicates that clinical psychologists are 
ited from other psychologists by greater artistic, literary, teach- 
bal, and social service interests. Experimentalists have stronger 
in physical science, mathematics, and mechanical work, Guid- 
sychologists have a stronger preference than others for interviewing, 
e to others, personnel work, and writing. Industrial psychologists 
stinguished by their business interests. Py 


Correlation between Keys 


The relationships among the five new keys developed in this study are 
h by the following correlations based on a sample of 216 psychologists 
ded in any of the four sub-key criterion groups. (To be exact 
rrelations are not between keys but between the scores of indi- 


Psychologist pio Experimental Guidance 
Key Key Key Key 
30 + .09 
25 + .09 —.52 + 07 
—.32 + .08 28 + .09 —.82 + .03 
—.36 + .08 —.13 + .09 —.87 + .08 54 + 07 


are must be used in interpreting these correlations. The positive 

tion between the clinical key and the psychologist key means that 
| psychologists tend to differ from other psychologists in somewhat 
‘ay that psychologists-in-general differ from professional men- 
n The positive correlation between the guidance and industrial 

eans that guidance psychologists tend to differ in the same way 
-guidance psychologists as industrial psychologists differ from 


488 Philip H. Kriedt 


Conclusions 


It is suggested that better guidance for potential psychology students 
can be given if the 1948 psychologist key is substituted for the present 
(1938) key, and if consideration is also given to the profile scores for psy- 
chologists-in-general now available. It is also suggested that advanced 
psychology students who are undecided as to the field of psychology in 
which they should specialize, should take advantage of the four new sub- 
keys and the profile data for 13 different kinds of psychologists developed 
in this study. 


Received July 22, 1949. 
Early publication. 


5 The five keys developed in this study have been approved by Professor E. K. 
Strong, Jr. Interest blanks sent to Stanford University are now scored on the 1948 
rather than on the 1938 psychologist key. Engineers Northwest (100 Metropolitan 
Life Building, Minneapolis 1, Minnesota) is equipped to machine score both the 1948 
psychologist/key and the four psychologist sub-keys on request. 


der Interest Patterns of University Business 
School Seniors 


Robert H. Shaffer 
Indiana University 


per reports the findings of an analysis of the mean raw and 

“scores made on the Kuder Preference Record by Indiana 
School of Business seniors in the classes of 1947 and 1948. 
erence is made to the interest patterns which were found 
eristic of students majoring in the various curricula, 


Procedure 


ence Record was administered to the 975 men and 205 
ts in the graduating classes of 1947 and 1948. As a voca- 
inventory it is widely used at the present time for voca- 
eling and, to some extent, in selection and placement. It 
in nine general areas of preference: mechanical, computa- 
tific, persuasive, artistic, literary, musical, social service, 


mean raw scores with standard deviations were calculated for 
the total groups by sex and for various sub-groups classed by 
or. The means of the students grouped by major were com- 
ith the means of the corresponding scales of the total group by 
critical ratio technique. 
Findings 
gives the mean raw scores in the nine areas of the Kuder Pref- 
ord for the total group of business school seniors and for sub- 
vided by major subject. Table 2 gives the percentile ranks 
pon the mean raw score as given in the published norms for the 
le 3 gives the raw scores for the women students and Table 4, 
ent percentile scores. 
parison of the mean raw scores revealed that the business 
ors studied had varied, and in many cases, markedly different 
iterns (Table 1). The’ accounting and advertising majors 
significantly different from the total business school group in 
All of the advertising scores were significant at the 1% 
489 


490 Robert H. Shaffer 


Table 1 


Kuder Preference Record Mean Raw Scores by Major for Senior Men in 
Indiana University School of Business 


Soe. Cleri- 


Major N Mech. Comp, Sci. Pers. Art. Lit. Mus. Ser eat 


Total Group 975M 648 40.5 54.7 97.0 39.7 514 19.0 69.4 61.1 
SD 182 148 135 198 127 146 93 17.0 143 


Gen. Business 179M 62.3 409 541 96.2 40.8 518 19.8 69.5 60.8 
SD 181 127 136 194 125 149 86 164 136 


Accounting 217M 67.9* 57.1** 58.3** 84.3— 37.6- 48.4— 17.1— 64.7— 70.3 
SD 174 89 116 158 121 139 95 15.2 185 


Financeand 37M 65.7 43.9 53.5 92.9 37.8 56.3** 20.0 67.6 61.1 
Banking SD 17.9 131 120 245 13.3 125 10.1 185 144 


Management 138M 68.6* 37.4— 53.9 96.2 37.3- 49.8 17.6 76.2** 61.5 
SD 195 124 182 164 120 134 89 182 13.7 


Advertising 107M 59.7— 30.9— 49.9— 104.0** 45,8** 57.4** 21.5** 64.4— 57.0— 
SD 172 116 149 184 139 165 9.3 15.3 133 


Retailing 109M 65.4 33.1— 54.7 105.1** 40.7 51.8 17.8 70.0 563— 
8D 17.5 11.3 121 17.0 12.1 13.6 9.1 17.5 125 


Sales f 136M 64.0 30.7— 52.3 112.3** 38.8 50.8 20.4 72.5* 55.6— 
SD 18.2 98 184 122 113 14.1 89 15.2 116 
EG ETI GU nan lel 
Note: In Tables 1, 2, 3, and 4, * indicates a positive difference significant at the 5% 
level, ** indicates a positive difference significant at the 1% level, - indicates 
a negative difference significant at the 5% level, and — indicates a negative 
difference at the 1% level. 


Table 2 


Mean Kuder Preference Record Percentile Scores of Senior Men by 
Business School Major 


j Cleri- 
Major N Mech. Comp. Sei. Pers. Art. Lit. Mus. O° “gal 


Total Group 975 25 72 21 96 2 6 57 70 73 
Gen, Business 179 - 22 74 2 96 31 66 60 7 7 
Accounting 217 29* g7** 27+ gp 24- 5g 5I— 57— 89°" 
Finance 87 2% 681 1895 2 zes er 66 73 
Management 188 81" Gi— 20 og 23-0) Go o 53 82" 74 
Advertising 107 20— 36— 13- gg qye yg Gore 56— 63- 
Retailing 109 26. 45-20 og 32 66 52 67 60- 
Sales 136 24 35— 17 99** 27 62 62 76* 517 
ere ET 5 82) 0278 T 


uder Interest Patterns of Business School Seniors 491 


Table 3 


nee Record Mean Raw Scores by Major for Senior Women in 
Indiana University School of Business 


Soc. Cleri- 
Ser. cal 


205M 47.5 33.2 47.1 80.0 51.2 543 223 76.9 67.3 
SD 148 13.7 154 181 165 147 86 18.0 181 


82M 464 40.2** 50.5 76.5 49.6 548 224 72.5 72.2 
o SD 149 131 140 178 13.6 153 69 186 13.2 


18M 46.2 32.6 43.9 846 464 51.5 21.7 84.6 68.4 
SD 16.9 125 129 174 15.1 112 68 16.5 127 


80M 49.2 24.3—42.0 87.4* 65.1** 57.1 23.2 66.0— 59.3- 
SD 153 80 136 178 155 129 7.9 196 16.1 


24M 48.7 31.7 43.3 89.7** 53.2 52.0 22.0 73.6 63.0 
SD 16.3 10.2 12.7 151 154 144 7.5 158 12.9 


26M 41.5- 30.6 44.9 77.5 488 53.4 24.5 79.7 81.9** 
SD 136 10.3 17.0 149 153 140 80 147 164 


80M 44.6 35.8 47.0 71.9- 44.6— 51.5 20.6 87.4** 75.5% 
SD 10.5 11.9 141 161 118 11.1 89 145 17.0 


Mech. Comp. Sci. Pers. Art. Lit. Mus. 


Table 4 


Mean Kuder Preference Record Percentiles of Senior Women by 
Business School Major 


z 
3 
È 
2 
5 

g 
las] 
a 

E 
e 
Ss 
G 


205 48 68 40 87 52 66 46 45 59 
32 45 88°% 50 80 48 66 46 35 7 
18 45 6 31 92 40 59 42 62 62 
30 52 35— 21 95* 79* 73 50 23— 40- 
24 5l 62 30 96** 59 6l 45 37 50 
26 a 3l, 59 34 82 45 64 55 52 8i%* 
30 39 7% 40 66 ; 36— 58 38, 68%, 76* 


positive scores in the persuasive, artistic, literary and musical 
with negative scores in the mechanical, computational, scien- 
service and clerical areas. The accounting group had rela- 
h scores in the computational, scientific, clerical and mechanical 
relatively low scores in the persuasive, artistic, literary, musical 
service fields. 


492 Robert H. Shaffer 


Other patterns found included the following: a relatively high literary 
interest by the finance group; positively significant social service and 
mechanical scores for the management with negatively significant scores 
in the computational and artistic areas; one positively significant score, 
in the persuasive area, for the retailing group with negatively significant 
scores in the computational and clerical areas; and a persuasive-social 
service pattern for the sales group with negative scores in the computa- 
tional and clerical areas. 

It is important to note that these observed patterns are based on 
differences in the mean raw scores and not on percentile scores. For 
example, the mean raw score of the advertising group in the artistic area 
was significantly higher at the 1% level than the mean score of the total 
group, yet the corresponding percentile score was only 47 (Table 2). 
Similarly, the mean score of the accounting group in the persuasive area 
fell at the 88th percentile, yet it was significantly lower than the mean 
score of the base group. 

Similar but less marked patterns were found for the women seniors 
(Tables 3 and 4). Judging from the number of significant scores, the 
women students majoring in advertising had the most definite interest 
pattern with significantly high artistic and persuasive scores and low 
computational, social service and clerical scores. Students studying to 
be commercial teachers had four significant scores, high social service 
and clerical and low artistic and persuasive. Contrasted with this pattern 
was that of the secretarial students with a high score in the clerical area 
and a low score, significant at the 5% level, in the mechanical area. AS 
in the case with the men, the women retailing majors had only one high 
score, in the persuasive area. 

_ The general business majors had the highest score in the computa- 
tional area and the lowest score in the persuasive area of any of the sub- 
groups. Their score in the clerical area was third high, being below 
the secretarial and the commercial teacher groups. Thus their pattern 
followed very closely the pattern of the male accounting majors. 

An extremely high persuasive percentile score, based upon the general 
norms, was found to characterize all of the groups with the possible 
exception of the commercial teacher group (Tables 2 and 4). 


Summary 


___ The Kuder Preference Record revealed significant differences in the 
interest patterns cl izing senior students majoring in the various 
curricula in the Indiana University School of Business. In practically 
every case the interest patterns for the various groups followed those £°t 


Interest Patterns of Business School Seniors 493 


occupations by the test manual. The need for establishing 
and for analyzing percentile and raw scores carefully was 
by the deviation of the business school group from the general 
establishing the published norms. 

indicate that the Kuder is a useful tool in assisting 
ose a major within a school of business. 


6, 1949. 


An Analysis of Certain Factors in Serious Accidents 
in a Large Steel Plant * 


John B. Whitlock, Jr. and Clarke W. Crannell 
Miami University, Ohio 


The group selected for study was comprised of the 100 most recent 
accident reports, beginning with the date upon which the study began. 
Starting with a case dated March 14, 1947, each accident was taken in 
reverse chronological order until 100 had been recorded. Case number 
100 is dated February 20, 1944. The 100 accident records obtained in 
this manner thus include all major accidents at the Armco Steel Cor- 
poration’s Middletown Division over a period of approximately three 
years. 

For comparison with these cases, a “control,” or accident-free group 
of two hundred cases was selected from men and women who had worked 
on the same jobs during the same period of time, but who had not had a 
major accident in that time. From the accident group reports a list of 
jobs on which accidents had occurred and the frequency of accidents on 
each was compiled. The foreman of each of the departments which was 
listed as having had one or more accidents in the three-year period was 
then contacted, and from these men the names of two accident-free men 
for each accident case listed was obtained. For example, the Repairmen 
Section of the Maintenance Department was found to have the greatest 
frequency, with twelve accidents; so the names of 24 accident-free men 
were obtained. In the case of the Masonry Department, where one 
bricklayer had been injured, the names of two bricklayers who had not 
been injured were obtained. This procedure yielded a group of 200 
men and women who had worked without accident on the same jobs 
during the same period of time as had the 100 men and women listed aS 
accident cases. 

For all individuals, the following information was transcribed from 
the company records to a specially prepared data sheet: (1) name; (2) 

check number; (3) age; (4) marital status; (5) dependents; (6) average 
weekly wage; (7) World War IT veteran; (8) physical rating—as decided 

* The data treated in this i e ARMCO 
Steel Corporation, A tie ie eei fullest 


conpra dian in opening their files and supplying much advice and assistance. 


ecause one job on which there had i ince discontinued, 
the control group actually consists af 108 been 2 accident has been since 


494 


_ Factors in Serious Accidents in a Large Steel Plant 495 


pany doctor at the time of employment; (9) height; (10) 
(11) blood pressure; (12) vision; (13) test scores,—a. Otis Test 
Abilities; b. Bennet-Frye Mechanical Comprehension; c. 
t Personality Inventory; all six percentile scores;—(14) educa- 
i) company service; and (16) job service. 
the accident cases, the following additional data were transcribed 
record sheet: (17) date of accident; (18) day of week; (19) time of 
(20) amount of shift worked; (21) whether doing usual job; (22) 
n accident occurred; (23) classification of accident,—a. days 
. part of body injured, c. type (crush, burn, cut fracture, etc.), 
bility (temporary total or partial, permanent total or partial, or 
(24) description of accident; (25) how it happened; (26) why it 
; (27) cause; (28) responsibility; and (29) previous major 
, if any, and classification. 
cause the records of height, weight, blood pressure, and vision 
en been made months or years prior to the accident, it was not 
ed feasible to include these variables in the investigation and 
e not considered. 
test scores (Item 13 above) were not available on all cases be- 
testing did not begin until 1942 and many of the men with whom 
dy is concerned were in the employ of the corporation prior to 
e. Among the accident cases, 47 individuals had taken two or 
the tests. Among the control individuals, 62 to 65 had taken 
ore of the tests. 
The company reports of major accidents are divided into three groups 
is of responsibility. This responsibility is determined by an 
t investigation committee which meets as soon as possible after 
ence of the accident. In the present study, the total group 
dent cases was divided on the basis of the committee’s decision 
e following subgroups: A—those who were totally responsible for 
accidents; B—those who were jointly responsible for their accidents; 
hose who were injured by the action or lack of action of someone else. 


Results 


est Data. No item of non-test data was found to differentiate 
tly between the accident and accident-free groups. This was 
true not only for each subgroup among the accident cases as divided 
to responsibility, but also for certain subdivisions of the con- 
oup made on the basis of length of company service. Table 1 
izes these non-test data. 

‘echanical Comprehension and Otis Test Scores. These two tests 
Teveal any significant relationship to the occurrence of accidents, 


496 John B. Whitlock, Jr. and Clarke W. Crannell 


Inspection of the Otis Test scores seemed to indicate that these scores 
increased in the control group with age and company service. An analy- 
sis of variance applied to these scores in terms of Age and Service demon- 
strated that the variance was significantly related in approximately 
equal amounts to both of these variables. The possibility is therefore 
not excluded that, were it possible “experimentally” to equate the 
accident and control data with regard to age and service, a relationship 
to Otis Scores might be found. 


Table 1 
Mean Values for Non-test Data 
Company 
Weekly Service i 
Group Age Dependents Wage (days) Education 
Accident A 39.0 2.00 $52.92 3255.5 7.84 
Accident B 31.8 1.87 48.91 2168.9 8.94 
Accident C 34.9 2.00 49,42 3251.4 8.73 
Control 41.8 2.82 57.80 5128.0 7.88 


Bernreuter Personality Inventory. The company records available to 
the present writers presented the Bernreuter scores in terms of percentiles. 
For the purpose of statistical computations, these scores were converted 
into T-score equivalents. Table 3 shows the means and standard devia- 
tions for each of the six Bernreuter scales. The writers are fully aware 
of the hazards in interpreting the scores of personality measures which 
have been obtained in industrial situations. It is quite probable that 
applicants for a job will give what appears to them to be a “correct” 
answer on such an inventory, rather than an answer which may revi 
their real opinions of themselves. It should therefore be kept in mind 
when reading the following discussion, that the terms alluding to person- 
ality traits are employed for convenience in identifying the scales at- 
cording to the system employed by the author of the inventory, and it 


Table 2 
Mean Test Scores by Accident Groups 

Accident Mech. Compr. Score Otis Score 

Group N M N M 
Accident A 9 30.9 10 82.9 
Accident B 17 33.3 17 93.4 
Accident C 21 31.3 21 87.8 
Control 62 34.0 64 84.5 


Factors in Serious Accidents in a Large Steel Plant 497 
t be implied that the writers believe the scores to be truly 
tive of any unitary personality trait. 

f the mean differences between accident and control groups, 
from data summarized in Table 3, were found to be significant 
sher’s ¢ test was applied. On the BI-N scale (neurotic tendency) 
n T-score for Accident Group A was significantly lower (less 

”) than the control group at the 5% level of confidence (t: 2.06). 

comparison between control group and Accident Groups 
combined yielded a significant difference at the 1% level. 
comparable results were obtained for the B3-I scale (introversion- 
ion). One further significant difference was found: between 
Groups A and B combined and the control group for the FI- 
nfidence in oneself), the mean for the combined accident groups 
nificantly lower (more “‘self-confident”) at the 1% level. 


Table 3 
Mean T-Score Equivalents of Bernreuter Personality Inventory 


M 8D 

B1-N B2-8 B3-I 
10 40.2 6.73 49.8 6.34 39.0 6.00 
17 36.9 8.02 48.9 8.02 36.5 8,84 
20 43.1 8.96 50.6 5.95 41.4 7.86 
65 45.3 7.61 48.0 6.60 43.4 6.60 

BHD F1-C F2-S 
10 52.5 541 42.9 7.23 42.6 6.40 
17 52.9 7.12 41.5 9,39 88.7 6.16 
20 51.6 6.43 43.1 9.16 48.9 6.07 


49.1 6.69 


ection of the means in Table 3 reveals that among the first 
the accident groups present alternately higher and lower values 
d with the control group. The writers examined the indi- 
ta for each scale, and discovered that individuals who were con- 
y or partially responsible for their accidents seemed to have 
epancies in their scores from one scale to the next than did 
ividuals who were accident-free or not considered responsible 
ecidents. This feature of the data was most evident when the 
B4-D (dominance-submission) scales were compared. By 
the B3-I score from the B4-D score for each individual, 
ores were obtained which were preponderantly positive fas 
and B—only one in Group A and three in Group B were zero or 
-On the other hand, 22, or one-third, of the cases in the control 


498 John B. Whitlock, Jr. and Clarke W. Crannell 

group showed zero or negative differences between these measures. The 
mean difference scores for Groups A, B, C and control were 14.0, 16.5, 9.8 
and 5.9, respectively. The means for Groups A and B are each signifi- 
cantly different from the control group at the 5% level or better. While 
there is considerable overlap among difference scores of the various groups, 
there does appear to be a significant trend for accident cases to have 
higher B4-D scores (“dominant”) and lower B3-I scores (“extroverted”), 


Summary 


From the accident report records of the ARMCO Steel Corporation, 
Middletown, Ohio, data were collected on 100 accidents which occurred 
over a three-year period ending in March 1947. Besides the accident 
data, personal data and test scores (where available) were obtained for 
these individuals and for a control group composed of two accident-free 
employees for every accident case. A statistical analysis was made with 
the following results: 

1. In this particular study, none of the “non-test” data was found 
to differentiate between the accident and control groups. 

2. Mechanical Comprehension and Otis Intelligence Test scores were 
not found useful for accident prediction purposes, but the Otis was found 
to be greatly affected by age and company service. 

8. Three of the Bernreuter Personality Inventory Scales seemed to 


differentiate to some degree between accident and accident-free groups.’ 


In the nomenclature of this Inventory, the accident cases appeared less 
“neurotic,” less “introverted” and more “self-confident.” Especially 
noteworthy was a tendency among the accident cases to have high BLD 
(dominance) scores and at the same time low B3-I (extroversion) scores. 
Tn view of the wide overlap among groups in these measures, and also 
considering the rather speculative nature of personality assessment by 
means of inventories, evidence of this sort can hardly be taken as 4 
reasonable basis for exclusion from employment. Nevertheless it does 
suggest that employees who conform to the “personality pattern” of such 
accident cases should be subject to closer scrutiny when placed on jobs 
involving the possibility of serious injury. 

The small number of cases studied here, and the limitation of = 
study to one industry, necessarily limit our conclusions to suggestions 
for further study. It is definitely believed that the variability among 
Bernreuter Personality Inventory scores would make a profitable subject 
for further study with a large number of cases. 

Received May 31, 1949, 

Early publication. 


Visual Performance and Accident Frequency 


Joseph Tiffin 
Purdue University 


and 


B. T. Parker and R. W. Habersat 
Bausch & Lomb Optical Company 


importance of adequate vision as a safety factor has received 
recognition from industrial safety engineers in recent years. 
tudies have pointed out the desirability of requiring industrial 
nts to meet minimum safety visual requirements at the time 
ment. In some instances these studies have shown decreases 
frequency following the introduction of such employment 
paper summarizes the results of a recent experiment in a light 
uring industry. The data confirm previous evidence that low 
Ormance and injuries frequently go hand in hand. In this 
è use of a minimum visual safety standard for employment on all 
bs probably materially reduced injury frequency and com- 
costs. 


Procedure 


e medical and accident records in a large optical company were 
and employees who had experienced three or more injuries ! in 
ous 18 month period were identified. These were considered 
frequency of injury” employees compared with the average ex- 
this plant. It was planned to compare the visual performance 
Ip with that of an injury free group to see whether there were 
ant differences. 

education, experience and the job hazards have all been shown 
ated to injury susceptibility (1, 3, 5, 6 and 8). In setting up 
free or control group these factors were therefore carefully con- 
d. This was done by matching each employee in the accident 
‘With an employee on the same job, and having the same age, 
of injuries included were: fractures, bruises or contusions, sprains, cuts, 
etc., or any other types of injuries that could possibly have been 
low visual performance. Such injuries as hernias and back strains were not 


499 


500 Joseph Tifin, B. T. Parker and R. W. Habersat 


education and experience, but who was “accident free” during the 18 
month period used for the study. This group of “accident free” em- 
ployees made up the control group. 

Tt was also necessary to make sure that none of those included in the 
two groups had received professional eye attention during the 18 month 
period of the study since such eye care undoubtedly would improve visual 
performance which might in turn have reduced the employee’s injury 
susceptibility. Since complete eye examination records were available 
on most employees in the plant’s own eye clinic, it was possible to deter- 
mine in most cases in advance of testing which employees had received 
professional eye care in this time. As an additional check after the 
pairing was completed and the testing begun, each subject was questioned 
carefully concerning the last date eye care had been obtained. Where 
either one of the pair had received professional eye attention during the 
previous 18 month period, the pair was dropped from further study. 
After this last control had been applied 42 matched pairs remained, a total 
of 84 employees participating in the experiment. 

The visual performance of each individual was tested on the Bausch 
& Lomb Ortho-Rater (4), a precision instrument measuring the visual 
skills listed in the first column of Table 1. 

On completion of the visual performance testing, means were com- 
puted for the raw scores of each group on the vision tests. Critical ratios 
were also computed to determine the statistical significance of any 
existing differences (2). Both means and critical ratios are shown in 
Table 1 in comparison with those obtained in a similar study conducted 
in a heavy industry (7). 


Results 


Table 1 shows that on the average, the injury free group of em- 
ployees had superior visual performance. 

In three visual skills—acuity worse eye (near vision), acuity, right eye 
(near vision) and color perception, differences exceeded the 5% level of 
significance. In Stump’s study (7) the critical ratios were high for 
several distance visual skills, but very low for near visual skills. This 
may be due to the fact that on the jobs in a heavy industry, distance 
visual skills are more frequently required in performing the job. On the 
other hand, for the jobs covered by the present study in the optical in- 
peel near visual skills are essential and are therefore important for 

These studies would seem to indicate that somewhat different patterns 
of visual skills may be required for safety on various jobs in different 
industries. Each plant should probably conduct its own investigation 


Visual Performance and Accident Frequency — 501 


Table 1 


ls Visual Performance Scores and Critical Ratios of Injury Free and 
gh-Frequency-of-Injury Employee Groups in Two Independent Studies 


Study No. 1 (gt Tad, gait 
0. reseni 
Th d p! 


Phoria (Far) 
Phoria (Far) 
ty, Both Eyes (Far) 
Right Eye (Far) 
y, Left Eye (Far) 
Better Eye (Far) 
, Worse Eye (Far) 
th (Far) 
(Far) 


y, Left Eye (Near) 
' Better Eye (Near) 
n Worse Eye (Near) 
al Phoria (Near) 


cant at 1% level or less. 
cant at 5% level or less. 
ese critical ratios were computed from Ms-Ma. 


finding basis to determine what visual skills are required for safe 
n, and establish visual safety standards accordingly. 
examination of Table 1 reveals that in general the level of 
Ormance of the workers studied in the first experiment was 
ably higher than that of the workers in the present study. This 
due to a difference in the average age levels of the individuals 
in each study, to job differences, or differences in the sources of 
(8). 


502 Joseph Tifin, B. T. Parker and R. W. Habersat 


Value to Management 


Since 1943 visual safety standards have been in use in this company 
for selection and placement of new applicants, and referral of present 
employees for eye care. During this entire period compensable accident 
costs have steadily decreased compared with the previous four year 
period. Due to the fact that changes in safety procedures are constantly 
taking place in all progressive plants interested in safety, it is, of course, 
difficult to ascertain how much, if any, of a decrease in injuries or accident 
costs can be attributed to one specific part of the overall program. Since 
the addition of the vision program was the only major change in the safety 
procedure in the plant studied during the period of this investigation, it 
is probable that it played a substantial role in the reduction of direct com- 
pensation costs by an average of $16,600 per year, which occurred over 
a four year period. Studies of small groups of employees in the factory 
have shown a definite decrease in injuries after professional eye attention. 
Further investigations of the effect of eye care on reduction of injuries 
are now in progress. 


Received February 23, 1949. 
References 


1, Chambers, E. G. A preliminary inquiry into the part played by character and 
temperament in accident causation, J. ment. Sci., 1939, 85, 115-118. 

2. Guilford, J. P. Psychometric methods. New York: McGraw-Hill Book Co., 1936, 
548-549. 


3. Henig, M. 8. Intelligence and safety. J. educ. Res., 1927, 16, 81-87. 

4. Jobe, F. W. Instrumentation for the Bausch & Lomb industrial service. Bausch & 
Lomb Magazine, 1944, 20, 6-7, 14-15, 

5, Lipmann, 0, Unfallursachen und unfällbekampfung. Berlin: 1925. 

6. Schmitt, E, Unfällaffinität und Psychotechnik im Eisenbahndienst. Industrielle 
Psychotechnik, 1926, 3, 144-153, 364-366, 

7. Stump, N. F. A statistical study of visual functions and safety. J. appl. Psychol 
1945, 29, 467-470, 

8, Tiffin, J. Industrial psychology. New York: Prentice-Hall, Ine., 1947, 13, 425-430, 
435-439, 443-444; and 228-230, 


Attention and Involuntary Movement 


Austin S. Edwards 
University of Georgia 


n a situation such as automobile driving, how important is in- 
tary movement? To what extent is uncontrolled movement dif- 
t with fixed and with shifting attention? To what extent is two- 
d | driving more steady and controlled than is one-armed driving? 
t sufficient to know that certain conditions are important for 
It is desirable to know as accurately as possible to what extent 
in conditions modify the control of activities in the total behavior 
viduals who are involved in skilled work or in what may be dan- 
üs Occupations. Although inferences cannot be made directly from 
oratory experiments to such activities as automobile driving, it is 


iu 


è S that the following experiments may throw some light upon the 


ro Meperiments have been performed for the purpose of discovering 
titatively the effect of certain conditions upon involuntary move- 
The involuntary movement chosen was finger tremor, arm ex- 
with no rest, since that can be accurately measured in three dimen- 
ith the writer’s finger tromometer.! The two conditions especially 
were distractions with attention held upon a fixation spot without 
shifting, and, second, the effect of distractions when the attention 


__ Experiment 1 Involuntary Movement with Fixed Attention 


This experiment was performed in the dark room with the light 
to one foot candle. The distractions were an automobile horn 

a Ford automobile, actuated by six volts, and the bright light from 
mouth automobile, actuated by a current of six yolts. The horn 
in a box about five feet in front of the S, where it could not be seen. 
light was placed about ten feet in front of the S slightly to the left 
ged so that it could easily be moved in any direction and shone 

the eyes of the S. The finger tromometer was on the table directly 


504 Austin S. Edwards 


Procedure. Standard procedure was used with the tromometer, the 
S being allowed to rest before the measurements began, and the measure- 
ments consisting of the sum of the three readings—front-back, right-left, 
up-down. The time of measurement was thirty seconds. 

The first control measurement was made before any distractions 
were used, and then measurements were taken while either the light or 
the horn, or both, were used as distractions. Following these three ex- 
perimental measurements, a second control measurement was made. 
When the light was directed into the eyes of the S, or the horn was 
blown, the time consumed by the distraction was approximately twenty 
of the thirty seconds. Order of stimuli was varied so that each stimulus— 
light, sound, or both together—appeared with varying Ss as either first, 
second, or third stimulus. The order of stimulation was recorded for 
each S so that not only could the results of each stimulating condition be 
studied, but also the effect of the stimuli as regards order, namely, first, 
second, or third, could be studied. 

Subjects. One hundred Ss, unselected college students, half men and 
half women, were used in this experiment. The ages were 17 to 25. The 
Ss were asked whether they had been in any accidents, automobile or 
other, and notes were made as to such accidents, their number and 
seriousness, 


Instruction. The following instruction was used: “Lean back com- 
fortably; both feet on the floor; hold your hand as steady as you can, and 
wateh fixation spot during the testing.” 

The sound of the horn was probably somewhat louder and more 
disturbing than is found in traffic, and the automobile light shining into 
the S's eyes was closer and probably brighter than is usually found in 
actual driving situations. Ss sat about five minutes before the experi- 
ment began. 


Since it might soon become known among students that sound and 
light distractions were being used, conditions were made as equal as 
possible for all Ss by telling them at the beginning of each series of meas- 
Urements that there would be a control measurement, light and sound 
distractions, one or both; after the last measurement with distraction 
Another control measurement was taken to compare with the first control. 

Results. Detailed results were worked out with means, medians, 
standard deviations, sigma of the mean, Q; and Qs. Analyses were made 
for men and women separately, both for the light stimulus and sound 
stimulus, and both together, and for the first, second, and third stimulus. 
This permitted finding out first whether greater effect might be found in 
finger tremor because of the sound, the light, or both, and second, whether 
the first, the second, or the third stimulus had more effect. 


Attention and Involuntary Movement 505 


somewhat surprising to find that on the average there was no 
y significant increase in finger tremor. Both stimuli together 
preciable effect greater than one of the distractions alone, 
it might be expected that the first distraction might be more 
ng and the later ones less, or, that with the second and third 
ions the S might become more upset, neither result appeared. 
ih all the averages running from about 37 to not quite 43 mm., the 
increase in finger tremor caused by the distractions was for the 
cent and for the women 9 per cent. None of the critical ratios 
the means was significant for the entire experiment, the highest 
66, which indicates results not much better than chance. 

ght be expected that those students who had been in automobile 
or who had had serious traumatic experiences would be more 
d than the others. No such evidence was found. Some of those 
d been in the worst accidents had the lowest finger tremor through- 
experiment. 

only positive results that can be stated are in the cases of a few 
students who were very greatly disturbed and who showed 
y increased finger tremor during one or all of the experimental 
nents. Of these Ss, and considering those whose finger tremor 
sed 50 per cent to more than 100 per cent, there were 16 men 
en. 

e 16 men considerably affected by sound, light, or both distrac- 
ether, it appeared that both distractions together affected the 
it and most frequently, 12 of the 50; sound affected very con- 
ly 6, and light 4 of the 50 men. 


fected by both distractions together; 9 by sound, and 6 by light. 
be noted that some of the 16 men and some of the 12 women were 
fected by more than one of the distracting stimuli. 

ng the order of distractions, first, second or third, for the 16 
women greatly affected, 6 men were most affected by the 
tion, 6 by the second, and 4 by the third. Of the 12 women, 
ost affected by the first distraction, 6 by the second, and only 
third. 


ng these cases and our averages for the 100 Ss altogether, there 
dence that two distracting stimuli cause more involuntary move- 
one of them alone; or that a series of distractions cause more 
in involuntary movement than do a first or second dis- 


tion. 
also be noted that there is no evidence of a general build-up 
g influence caused by a series of three disturbing stimulus 


506 Austin S. Edwards 


situations. This is, of course, not without exceptions in certain of the 
Ss. But the first control average was 38.29 mm. and the second control 
average following the distractions was only 39.46, an insignificant dif- 
ference of only 1.18 mm. 

Conclusions. It appears from this experiment with steadily fixed 
attention that, with certain exceptions (16 of the 50 men, and 12 of the 50 
women), students selected at random and irrespective of experience with 
accidents showed on the average no statistically significant increase in 
finger movement under conditions which were assumed to be considerably 
distracting and might have been expected to be quite disturbing. 

On the other hand, 32 per cent of the men and 24 per cent of the 
women had very considerable increases of involuntary movement. 

Considering all of the cases and the specially disturbed Ss altogether, 
there is no evidence that the two disturbing stimuli were more disturbing 
than was one at a time; nor that the third distraction had any more 
effect than the first or second. 

There was no build-up of disturbing effect since the first control 
experiments and the last showed no significant difference in average. 

Some suggestion appears from this experiment to corroborate what 
we already know, namely, the importance of fixed attention in connection 
with motor control. 


Experiment 2 Involuntary Movement with Shifting Attention 


Tn this experiment conditions were changed in several ways. Pre- 
liminary tests were made with the same set-up described in Experiment 
1, but with the instruction to S to shift attention and to look sideways 
during the measurements. The preliminary experiments indicated de- 
cidedly different results and led to the development of an experiment 
in which the steering wheel of an automobile was fastened to two units 
(front-back, right-left) of the tromometer. The steering wheel was 
mounted so that it was at the height and angle of the steering wheel in 
acar. The tromometer was placed so that the top of the steering wheel 
was between the two units of the tromometer that were used for measure- 
ments. Movement of the wheel thus moved one of the riders on the tro- 
mometer. Movement in the opposite direction moved the other rider. 
In this set-up the control experiment might have practically or almost 
zero recorded, because with both hands on the wheel it was possible to 
hold it very steady. Also, two riders instead of three were engaged. 
All measurements, both control and experimental, were thus very much 
reduced. 

Procedure. _S was given time to rest before the control measurements 
were taken. He was told that during the experiment he was to remain 


Attention and Involuntary Movement 507 


as possible, to keep his feet flat on the floor, and to keep his 
e fixation point unless told otherwise. He was to be as com- 
as possible in the chair which was placed at a comfortable 
on for the S for holding the wheel. The first control measurement 
with S placing both hands on the wheel and holding his atten- 
steadily on the fixation spot, which was three feet in front of him. 
measurement was thirty seconds. After S had rested, the first 
rimental measurement was taken; S was in the same position as in 
ontrol measurement, but after fifteen seconds he was told to look out 
i the window, which was six feet to his right. After four seconds S 
told to look back at the fixation spot. The second control measure- 
was taken with only one hand on the steering wheel. S was told 
he hand with which he wrote, and to hold attention steadily on 
fixation spot. The next experimental measurement was made with 
i and on the wheel, but after fifteen seconds S was told to look out of 
ie window. After four seconds S was told to look back at the fixation 
_ The third experimental measurement was made by having S hold 
hands on the wheel, but after fifteen seconds S$ was told to take the 
pencil which was handed to him by Æ and then to put his hand back on the 
The pencil was handed to him at a distance of two feet. There 
thus two control and three experimental measurements. 
‘Subjects, In this experiment there were 60 men, aged 18-35, and 40 
men, aged 18-24, all college students selected at random. 
Results. The results in this experiment are in direct contrast with 
Experiment 1. For the 100 students, with the men and women 
separately, the effect of shifting attention was great and consis- 
all differences with definitely significant critical ratios, 2.526 to 9.71. 
the men, the means showed an increase from the first control to the 
experimental situation on the average from 3 mm. to 4.6mm. The 
d control was 6.18; the second and third experimental ratings 9.05 
48 respectively. For the women, the increase from control to 
xperiment was from 2.38 mm. to 2.85, and from the second control 
mm. to third and fourth experimental measurements 6.93 and 11.7. 
è smallest percentage increase was 40 and the largest percentage in- 
e was 158. The average percentage increase for all Ss was 80.9; 
82.25, for women, 79.5. See Figure 1. 
compared with the first controls (both hands on the wheel and 
attention) other conditions showed increases of uncontrolled move- 
of 300 to 400 per cent. On the basis of the averages, and taking 
st control series for the men and for the women, shifting the atten- 
to reach and take a pencil increased involuntary movement for the 
n, 4.46 times, and for the women, 4.91 times. If so great increase of 


508. Austin S. Edwards 


involuntary movement takes place in the relative quiet of the laboratory, 
how much is to be found under the more disturbing conditions of actual 
automobile driving? 

Reference to Figure 1 also shows a very significant difference in con- 
trol, when only one hand is used instead of two. The uncontrolled move- 
ment is about twice as great. 

Conclusions. With shifting attention and slight distraction (no 
bright lights or loud sounds), the increase in involuntary hand and arm 
movement was large, consistent, and statistically significant. Although 
one cannot draw conclusions directly from these experiments to what may 
be expected to happen in such a situation as driving an automobile, the 
question is raised as to how much danger exists in driving on account of 


Take pencil 5 


11.7 


9.05 
1 hand, shift attn. 
6.93 


6.18 
1 hand, fixed attention 


2 hands, shifting attention 


2 hends, fixed attention 
Nm: 1 3 5 2 9 11 13 15 

Fra, 1i Increased hand and arm movement with shifting attention. Tr indicates 

the series of trials: 1, the first control series; 2, the first experimental series; 3, the second 

control series; 4 and 5, the experimental series that followed 3. Horizontal lines are 


in mm., the upper line for men, the lower for women. There were 60 men and 40 women. 
The C.R.’s between means were all from 2,526 to 9.71. 


involuntary movement over which the driver has no control. We may 
perhaps surmise that with the varying conditions of driving or of similar 
occupations, the varying amounts of monotony, emotional excitement, 
and various distractions might produce much more uncontrolled move- 
ment than we have found in the laboratory. 

Considering the two experiments together it appears that hand and 
ee are relatively great with fixed attention; and that in- 
all movement is relatively great when attention is shifting, espet- 

y when one is using one hand for a second operation. 
Tt also appears that in a considerable number of cases involuntary 


movement of considerable amount ma; is steadily 
dyed ation, y occur even when there is 


Altention and Involuntary Movement 509 


t increase of uncontrolled movement when only one hand 
of ‘no small significance for automobile driving in traffic. 


Practical Implications 


it is clear to those who are informed that individuals suf- 
n certain abnormal conditions such as epilepsy and general 
hould not be permitted to engage in occupations where uncon- 
novements might happen, it has not been clear that there are 
normal individuals and so-called normal situations that should 
itation of occupation in the interest of safety to self and to 
more accurate and thorough knowledge of the involuntary 
that may occur with easily disturbed individuals and for 
all “normal” people under certain disturbing conditions may 
d place in connection with our industries, automobile driving, and 
ations which demand steady neuro-muscular control. Too 
known and known accurately about the relation of involuntary 
nt to the total behavior pattern. 

uestion i is forcibly raised: How dangerous is the one arm driving 


fanuary 17, 1949. 


. 512 Book Reviews 


are a willingness to use subjective methods as a supplement to objective 
tests, the development of projective and situation tests, and the routine 

` use of the intensive case conference. At the same time, lack of familiarity 
with the principles and practice of personnel psychology resulted in some 
unnecessary mistakes and in some “findings’’ which have long been 
familiar to workers in the field. To cite just one of several possible ex- 
amples, despite an excellent analysis of the complexity and unreliability 
of their own criteria, they state (on page 397) that “If the job is running a 
lathe in a large factory the rate of piece-work production is a satisfactory 
‘criterion . . .,” whereas various studies have shown that even as “ob- 
jective” a criterion as output may be an unreliable and invalid index of a 
worker’s success. } 

The organismic as opposed to the elementalistic approach to measure- 
ment is described and evaluated in detail, for perhaps the first time in a 
practical situation. Many of the techniques used by the OSS Assessment 
Staff were not new, and were not proclaimed as new, but they were 
thoroughly tried, revised in the light of experience, written up in detail, 
and statistically analyzed. The psychologist or personnel worker in- 
terested in the testing possibilities of social and practical situations, or in 
the use of the interview and of the case conference in evaluating leadership 
ability or ability to adjust to trying situations, will find many worthwhile 
Suggestions in this account of the trial of these methods. This source is 
unusual among clinical studies, because the Assessment Staff was self- 
critical, checked for personal and systematic bias, and made all possible 
statistical analyses of their data as their work progressed. 

The organismic method attempts to predict future behavior by in- 
ductive thinking from a set of observed facts to a conception (the hypo- 
thetical formulation of a personality), and then inductively to predict 
behavior in an anticipated situation, whereas the elementalistic approach 
attempts to predict future facts directly from observed facts. 

It is especially interesting to compare the conclusions concerning the 
relative validity of the two methods reached by the authors wtih those of 
rope workers in one of the other major personnel selection programs of 

e Armed Forces. The OSS psychologists state on page 227: “It is the 
contention of § (one OSS unit) that more can be learned of people by 
ae with them as ‘good Joes’ than by testing them as professors.” 
f paid pride? will be pardoned for stating that this seems to him 

T i overy of someone who has previously done his testing as a 
te iu al aboratory man or clinician detached from the situation into 
Stas i do rm who 

Gh gathering data Situation, who has used tests as only one metho 
ring data, and who has had to be a “good Joe” all along in order 


LLL P ———————— ee ttle 
CO EE —————— } 


Book Reviews 513 


o do any kind of testing!) In contrast with this statement concerning 
“superiority of observational and projective techniques, one should 
ler the conclusions drawn from military testing experience by factor 
lysts such as Guilford in a recent article in the Psychological Review, 
osing a selection procedure which is even more elementalistic and 


organicists of the OSS. There is no space in this review to explore these 
“conflicting views, but they should be mentioned in juxtaposition, and 
some comparative data cited. 
_ The OSS authors state that no adequate comparisons have been 
made between organismic and elementalistic approaches, but a few in- 
" adequate comparisons could be made. The AAF Aviation Psychology 
Program attempted to validate a number of clinical or organismic tests 
against success in flying training. Although time did not permit assess- 
| ment by case conference procedures using all available information, single 
| ‘Organismic techniques (e.g., interviews and Rorschach interpretations) 
‘were validated and proved to have no predictive value, while a number 
"of single objective tests had validities in the .30s and .40s. In another 
study a psychiatrist and a psychologist interviewed cadets with border- 
line test scores and found that their clinical or organismic evaluations were 
no better than chance in predicting flying success in this group. 
Despite their zeal for their procedures, the OSS staff frankly point 
out their defects (“Sometimes . . . we did not know who was deceiving 
Whom...” p. 142) and, if anything, underrate their demonstrated 
validity. Concerning the validity of the assessment procedure, they 
state: “the final validity is a question mark” (p. 392). But this point 
bears consideration. 
\ The validation procedure was made difficult by the scattering of OSS 
members over the world and by the lack of uniformity in assignments. 
Tt was made even more difficult by the fact that no attention had been 
paid to establishing criteria of success, a natural omission in the pressure 
of the early days of the war. Despite these and other problems, the 
‘average validity for the two principal assessment stations, for a sample of 
171 men followed up and rated for job performance overseas by the Over- 
Seas Assessment Staff, was .45. There were no control groups to permit 
4 comparison of the effectiveness of these with other methods, but taken 
by itself this is not a negligible validity coefficient. 
A few interesting findings should be cited, to whet the appetites of 
Possible readers. The motives of OSS volunteers were not generally 
ideological, but largely professional: trained men sought opportunities 
i for worthwhile specialized experiences (p. 247; or do Americans tend to 
play down their idealistic motives?). Leadership is a relatively general 


514 Book Reviews 


trait (p. 303). Cultural differences change the nature of situation tests: 
Chinese candidates ascribe leadership to their friends rather than to 
leaders (p. 352). Clinical psychologists are better able to diagnose than 
to predict, many apparent insights into personality being wasted because 
of lack of knowledge of the prediction situation (p. 430). There is a 
tendency to overevaluate outgoing persons with egocentric motives and 
low integrity, emotionally unstable individuals whose manifest behavior 
is acceptable, and persons of low ability but possessed of goodwill and 
good social relations (p. 438 ff.). There were no systematic errors of 
underrating. Assessees gain considerable self-insight in the assessment 
process (p. 201). Traumatic experiences are common in the backgrounds 
of normal persons (p. 468). 

Suggestions for future work are discussed in the last chapter. Again 
it may be worth pointing out the parallel with World War I, which 
brought ex-Army psychologists together in the postwar Scott Company 
and the Personnel Research Federation. After World War II some of 
the leading ex-Army psychologists founded the firm of Richardson, 
Bellows and Henry; a number of ex-Air Force psychologists launched 
the American Institute for Research; several established ex-Navy psychol- 
ogists struck out on their own or went into business organizations; and 
now the final chapter of the OSS book reads almost like a request for a 
research grant. The Assessment Staff point out the possibilities inherent 
in their technique for the selection of executives and other key contact 
personnel (possibilities being capitalized by Selection Boards working 
with private industry and with government in Great Britain and in 
Australia), the unusual opportunities which such work affords for studying 
normal personality structure, and the ease with which assessment pro- 
cedures such as these lend themselves to the training of junior psychol- 
ogists in the observation, interpretation, and prediction of human be- 
havior, that is, in the integrative processes with which most psychologists, 
both laboratory and clinical, have too little experience. 

It is to be hoped that some far-sighted business or industrial concern, 
some foundation, or best of all, perhaps, some combination of the two, 
will be challenged by the prospect set forth by the Assessment Staff, and 
that organismic psychologists, both theoretical and applied, will have & 
real opportunity to develop and exploit the possibilities of their approach. 
Perhaps there will soon be a country house for psychological houseparties 
a the North Shore of Long Island or somewhere in Westchester! When 

© opportunity comes, it is to be hoped that the rules of procedure and 
recommendations for control and criteria made by the Assessment Staff 
will be carefully studied. The lessons learned and reported by the 
Assessment Staff should not have to be relearned by each group of 


Book Reviews 515 


ratory and clinical psychologists venturing into the personnel field. 

e time, their original and creative work should add considerably 

fools of the vocational psychologist and should broaden his under- 

and deepen his insights. This is a valuable book, worthy of 

ul study by students of personality and by students of man at work, 
i} Donald E. Super 

s College, 


‘olumbia University 


Escalona, Sybylle K. An application of the level of aspiration experi- 
“ment to the study of personality. Teach. Coll. Contr. Educ. No. 937. 
E vii+132. Cloth $2.10. 


report of the study of the level of aspiration behavior of high school 
en is presented in this monograph. Although seventy-eight cases 
originally studied, this report deals with the comparison of nineteen 
la usted and nineteen well-adjusted subjects. The level of aspira- 
n techniques used were similar to Jucknat’s, where the subject selects 
task from a series of tasks of graded difficulty placed on a table, but 
o included a pre-arranged sequence of success and failure experiences 
sh as Gardner used. In addition to the usual level of aspiration 
ores, decision time and voluntary discontinuation after failure were 


0 measured. The method also included careful interviewing of the 
ects following the test procedure. 
Two of the interesting quantitative findings were concerned with 
on time and voluntary discontinuation. It was found that those in 
TMaladjusted group spent significantly longer time in deciding what 
- to attempt, and they showed a greater tendency to wish to discon- 
v tue the experiment following a failure when given an opportunity to 
50. It was also found that those in the maladjusted group were more 
ely to lower their choices after failure than the subjects in the normal 
ip. In general, the results are interpreted in the topological frame- 
, and a number of stimulating hypotheses are presented. 

Me major limitation of the study is the small number of cases. A 
nd limitation is concerned with the grouping of cases into maladjusted 
Well-adjusted. Some of the maladjusted cases were aggressive delin- 

ts, others were withdrawn daydreamers. Part of the author’s failure 
find consistent results on some measures is probably a function of the 
Ogeneous nature of her maladjusted group. The author did frac- 
te her maladjusted group into sub-groups for some measures, but 
groups were unfortunately too small to make meaningful, statistical 
Other limitations of the method for the study of personality 
ies of individuals were the short number of trials, the method 
Which discouraged, although it allowed, a repetition of the same 


516 Book Reviews 


choice following either success or failure, and the loss of flexibility in- 
volved in the prearranged sequence of success and failures. All of these 
factors tended to limit the study of patterns of response and to provide 
only a brief sample of the subject’s behavior in what might be considered 
a free-choice situation. 

The author concludes that although no one-to-one correlation be- 
tween personality characteristics and particular responses in the level of 
aspiration situation can be found, nevertheless, for clinical evaluation the 
method has many advantages. One such advantage is the opportunity 
for the direct study of behavior as contrasted with projective methods 
which involve an additional step of interpretation and consequently a 
potential, additional source of error. 

This is a careful and insightful work, well worth the study of psycholo- 
gists interested in the utilization of level of aspiration techniques for the 
clinical or experimental study of personality. Its many fruitful hypo- 
theses should serve as a stimulation for much needed additional research 
in this field. 


Julian B. Rotter 
Ohio State University 


Goldstein, Naomi F. The roots of prejudice against the Negro in the 
eae States. Boston: Boston Univ. Press, 1948. Pp. ix+213. 
2.50. 


Tn this book, Dr. Goldstein advances the thesis that present attitudes 
of prejudice against the Negro in the United States can only be understood 
in terms of the unique position of the Negro as a former member of aslave 
class, In developing this point of view, Dr. Goldstein uses an approach 
that is becoming more and more prevalent among well-trained students 
of the social sciences. The materials and methods of analysis of history, 
economics, and sociology, as well as psychology, are brought to bear 
upon the problem with the happy consequence that this small volume 
gives a well-rounded picture of the many facets of race prejudice. J 

The book begins with a brief description of the status of the Negro 1m 
the United States today. The concept of race prejudice is then developed 
as a tendency to react to an individual not primarily as an individual but 
as a member of a racial Sine “In examining a situation to determi 

Tace prejudice, the main criterion is reaction tO 
Jr group rather than to the iridna, any attitude at all which meets 
this definition must be considered prejudiced” (pp. 34-35). To be free 


Book Reviews 517 


ıl examination of existing theories of race prejudice is then 


legro in the United States. Current prejudice can be traced 
astitution of slavery, the maintenance of which demanded legal 
hological separation of white and Negro. The traditions, beliefs, 
| attitudes established at the time of slavery became crystal- 
norms governing Negro and white relations. With the 
m of the Negro, slavery was no longer legally permissible, but 
is derived from slavery were still in existence and were perpetu- 
the legalization of discriminatory and segregative practices. 
and hostility thus continue to exist because of the character of 
pported segregation and discrimination, dating from the period 
struction. 

rest to many psychologists will be Dr. Goldstein’s analysis of 
hich the norms concerning the Negro are expressed in songs, 
s, newspaper stories, drama, films, fiction, radio—and 
isements. This analysis reveals three basic stereotypes: the 
the Negro as a “contented slave,” the picture of the Negro as 
barbarian,” and the picture of the Negro as a “comic.” The 
been the most pervasive of the stereotypes and embodies all of 
as to why the Negro was happy as a slave and wretched as a 
his love of fun, his dependency upon white folks, his childish 
_ his inherent laziness, his few and simple needs, and so forth. 
ypes have, at all times, served to justify the status of the 
second-class citizen. 

velopment and perpetuation of a norm which is essentially 
er than friendly toward the Negro, Dr. Goldstein believes, is 
it of a system of punishments applied to those who refuse to con- 
henorm. For a white person to engage in friendly relations with 
results in loss of prestige, status, and other social and eco- 
“Prejudice against the Negro cannot reasonably be 
to disappear until segregation—the forced isolation of all mem- 
one group from all members of the other is no longer legally 


Allen L. Edwards 


Your job. New York: Harper & Brothers, 1948. 
M+238, $2.75. 
f job, a guide to opportunity and security, is a factual and common- 


518 Book Reviews 


in the labor market. Although it is optimistically “written for every 
worker” and “lay and professional adviser,” it is too elementary to benefit 
such a wide audience. However, it can definitely be read with profit 
by most entry workers, job explorers, and less experienced counselors 
and personnel workers. The book might also serve as a supplementary 
text in college counseling and guidance courses. 

Early chapters are devoted to a discussion of elementary self-analysis 
and a survey of the world of work, while later chapters deal with such 
topics as wages, personal documents and papers, where to go for infor- 
mation about jobs and job opportunities, and the job interview. There 
are also helpful sections on training and schooling, the role of labor 
unions, setting up your own business, and rights and benefits under 
current social legislation. 

Your job is recommended as a book intended for the inexperienced 
and poorly informed worker and for the fledgling personnel administrator 
or counselor. There is a little uneveness in coverage of some topics and 
perhaps several places where we might take exception to the author's 
emphasis, yet these minor criticisms are more than balanced by the 
book’s merits for this specified audience. It covers crucial aspects of 
“your job” in an interesting and readable manner. It presents much in- 
formation that the counselee could profitably read in preparation for a 
counseling interview. It is based on excellent source material (largely 
federal and local government publications). It is a realistic discussion of 
jobs with frank statements about common employer attitudes and sug- 
gestions for handling some of these prejudices. It does not overlook the 
tremendous importance of individual counseling. 

í As an introductory source book and orientation to the topic of the 
individual and his job, Kaufmann’s book is commendable. 


William A. McClelland 
Brown University 


A New Books, Monographs, and Pamphlets 
monographs, and pamphlets for listing and possible review should be sent to 


_ Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


the new employee. Paul W. Boynton. New York: Harper and 
others, 1949. Pp. 136. $2.00. 

ral adjustment in old age. Ruth S. Cavan et al. Chicago: Science 
arch Associates, Inc., 1949. Pp. 204. $2.95. 

ngs in general psychology. Wayne Dennis, Editor. New York: 
mtice-Hall, Inc., 1949. Pp. 525. $3.75. 

alytical bibliography of modern language teaching. Edited by Robert 
“Herndon Fife. Washington, D. C.: American Council on Education, 
1949. Pp. 549. $5.50. 

idity of commonly employed occupational tests. Edwin E. Ghiselli. 
Angeles: University of California Press, 1949. Pp. 287. $.75. 
ology for the profession of nursing. Jeanne G. Gilbert and Robert 
. Weitz, New York: The Ronald Press Co., 1949. Pp. 275. $3.00. 
in the primitive world. E. Adamson Hoebel. New York: McGraw- 
“Hill Book Co. Inc., 1949. Pp. 543. $5.00. 

iments on mass communication. Carl I. Hovland, Arthur A. Lums- 
daine, and Fred D. Sheffield. Princeton: Princeton University Press, 
1949, Pp. 345. $5.00. 

ed thinking. George Humphrey. New York: Dodd, Mead and 
., 1949. Pp. 229. $3.50. 

are your children. Gladys G. Jenkins, Helen Shacter, and William 
z Bauer. Chicago: Scott, Foresman and Co., 1949. Pp. 192. 
Sit 50. 

nature and conditions of learning. Howard L. Kingsley. New York: 
Prentice-Hall, Inc., 1949. Pp. 579. $4.50. i 
stration. The study of behavior without a goal. Norman R. F. Maier. 
3 New York: McGraw-Hill Book Co., Inc., 1949. Pp. 264. $3.50. 
dance policy and practice. Robert H. Mathewson. New York: 
Harper and Brothers, 1949. Pp. 294. $3.00. 

psychology of personal adjustment. Second edition. Fred McKinney. 
New York: John Wiley and Sons, Ine., 1949. Pp. 752. $6.00. 

and public opinion. Norman C. Meier and Harold W. Saunders. 
w York: Henry Holt and Co., 1949. Pp. 400. $3.50 College 
on; $4.75 Trade edition. 

519 


520 New Books, Monographs, and Pamphlets 


Psychological testing. James L. Mursell. Second edition. New York: 
Longmans, Green and Co., 1949. Pp. 488. $4.00. 

A manual of pronunciation. Norriss H. Needleman. New York: Barnes 
and Noble, Inc., 1949. Pp. 323. $4.00. 

Child development. Willard C. Olson. Boston: D. C. Heath and Co., 
1949. Pp. 432. $4.00. 

Biology of mental defect. Lionel S. Penrose. New York: Grune and 
Stratton, Ine., 1949. Pp. 270. $4.75. 

Christianity and fear. Oscar Pfister. New York: The Macmillan Co., 
1949. Pp. 589. $6.50. 

Effective communication in industry. Paul Pigors. New York: National 
Association of Manufacturers, 1949. Pp. 88. Copies free upon 
request. f ; 

Experimental psychology. Leo Postman and James P. Egan. New York: 
Harper and Brothers, 1949. Pp. 500. $4.50. 

Opportunities in vocational guidance. Sarah Splaver. New York: Voca- 
tional Guidance Manuals, 1949. Pp. 104. $1.00. 

Children of Brasstown. Celia Burns Stendler. Urbana: Bureau of Re- 
Search and Service, College of Education, University of Illinois, 1949. 
Pp. 103, $.60. 

Introduction to Zen Buddhism. Daisetz Teitaro Suzuki. New York: 
Philosophical Library, 1949. Pp. 136. $3.75. 

Adolescent fantasy. Percival M. Symonds. New York: Columbia Uni- 
versity Press, 1949. Pp. 397. $6.00. 

Experimental psychology. Benton J. Underwood. New York: Appleton- 
Century-Crofts, Ine., 1949. Pp. 638. $4.50. 

Children with mental and physical handicaps. J. E. Wallace Wallin. New 
York: Prentice-Hall, Inc., 1949. Pp. 576. $5.00. 

Advances in insurance coverage—accident prevention and control. New 
York: American Management Association, 1948. Pp. 39. $.75. 
Appraising and training office supervisors. New York: American Manage- 

ment Association, 1948. Pp. 39. $.75. 

Building quality into manpower. New York: American Management 
Association, 1948. Pp, 35, $.75. 


DECEMBER, 194! 


Study of Executive Leadership in Business. 
I. The R, A, and D Scales 


C. G. Browne* 
Wayne University 


is the first in a series of papers which will present the following 
for the study of leadership and executive relationships in busi- 
8: R, A, and D scales; social and organizational contacts; sociometric 
j Goal and Achievement index (1). 

total study proceeded on the following hypotheses: (1) leadership 


function and leadership in business is a process of the inter- 
social and working relationships within ‘and outside of the ex- 
oups; and (3) executive and leader relationships can be ana- 


Procedure 
ubjects in these explorations were 24 executives of a tire and 
company in Ohio, named the Congo Tire and Rubber Company 
sof the study. Table 1 includes a listing of the executives by 
lepartment. All of the company executives on the first, second, 
ourth echelons of the business, and all of the executives on the third 
with one exception were included. Data were obtained in a 
tely structured interview, varying in length from 244 to 344 
Some of the executives completed the R, A, and D scales during 
iew, while others completed them at another time. In all cases, 
les were explained during the interview. 

R, A, and D Scales 
D index form devised by Stogdill and Shartle in their studies 
leadership consists of six scales, each containing eight state- 
writer is indebted to Drs. C. L. Shartle, Harold E. Burtt, and Ralph M. 
the Ohio State University for their guidance and criticisms throughout the 


521 


522 C. G. Browne 


ments (4). Seales A and B are for Responsibility; scales C and D, for 
Authority; and scales E and F, for Delegation of authority. The person 
completing the forms checks his first and second choices of statements as 
they best apply to him on each of the six scales.! The following are ex- 
amples of the statements for each of the variables: Responsibility, “I am 
responsible for the successful operation and coordination of all activities 
in the organization”; Authority, “J make no decisions whatsoever but 
request instructions from my superior on all matters”; Delegation of 
authority, “I have delegated full authority to my assistants, allowing 
them complete right of decision in all functions.” 
Scoring of the individual items on each scale was developed using the 
Thurstone equal appearing interval technique (5). To establish scale 
values, the statements were evaluated by staff, graduate students, and 
seniors in psychology at the Ohio State University. The mean of the 
_ point values of the four statements checked is the score on that variable, 
Scale values for the statements range from 1.0 (indicating a high degree 
of the factor) to 8.7 (indicating a low degree). Therefore, the lower scores 
indicate a higher degree of the item measured, while the higher scores indicate 
a lower degree. 
( R, A, and D Scores 
The R, A, and D scores of each executive, the mean scores by depart: 
mental and total groups, and the range for each factor are given in Table 
1. Remembering that the lower scores indicate a higher degree of the 
factor, the score of 1.6 for the president and general manager represents 
the highest for both R and A. Likewise, the scores of 5.2 and 5.1 for the 
manager of tube sales represent the lowest for R and A, respectively. The 
vice-president-sales had the highest D score, and he was also one of 
three executives who received the greatest number of choices 0n 
sociometric diagram. While the secretary of the company 
lowest D score (6.7), an analysis of his work revealed that he had 
under his supervision to whom he could delegate the relatively 8™ 
gree of authority which he estimated he had. 
In any measure—individual scores, departmental mean 
means—the R scores were almost consistently the highest, f 
the A scores, and finally the D scores. This indicates ® general tre 
the executives to estimate that they delegated authority 1} ae 
a than they estimated either their responsibility oF author! a, be 
$ their authority was less than their responsibility. Altho i iy 
Ca fa the R and A scores were almost identical, there was ê concel d 
ion of R scores, there being 22 cases between 2.7 and 3.9, with 2m? 
1 Requests for information regarding the R, A, and D scales may be add gh 
ei coer Spota eo State University Leadership sari 


no one 


s, of total 
ollowed by 


= a 
en eee 


Study of Executive Leadership in Business 523 


Table 1 
Ho: R, A, and D Scores 
Executive Department and Title R Score A Score D Score 
a TN ATE 
: General Administration 
President and genl. manager 1.6 1.6 2.7 
Secretary of the company 3.2 438 6.7 
Director public relations 3.3 2.9 3.8 
_ Purchasing agent 3.9 3.6 64 
Department mean 3.0 3.1 49 
Sales 
___ Vice-president-sales 2.7 3.5 2.3 
+ Sales manager 2.7 2.9 3.0 
Manager Congo stores 2.7 3.8 4.5 
Manager sales promotion 2.7 3.4 4.9 
Manager sales orders 2.7 44 4.9 
Manager tube sales 5.2” 5.1 5.1 
Department mean 3.1 3.8 41 
Finance 
‘Treasurer 2.7 4.1 4.9 
Comptroller 2.7 3.4 3.8 
Supervisor cost accounting 2.9 4.1 5.2 
Chief accountant 3.3 3.4 6.2 
Dopartment mean 2.9 3.7 5.0 
Manufacturing 
Vice-president-manufacturing 3.0 2.9 3.2 
Plant engineer ` 3.7 4.4 4.0 
Chief chemist 3.7 3.4 3.8 
Product engineer 2.7 3.6 3.8 
Foreman bicycle tire production 3.7 4.7 2.9 
production control 2.9 3.4 44 
Manager quality control 2.7 4.5 5.5 
shipping 3.3 4.3 6.0 
t mean 3.2 3.9 4.2 
Personnel 
Personnel director 3.2 2.3 2.5 


28 for all R scores. The A scores, however, distributed more uniformly, 
with a mode of 4.3, while the D scores had the greatest range, but were 
distributed most uniformly. There were 17 D scores lower than 3.3, 
Compared with 18 R scores of 3.3 or higher. 

__ While these scores cannot be considered to be predictive of executives 
in other companies on the basis of work done, or departmental assign- 
Ment, or echelon level, the method offers opportunities to study working 


524 C. G. Browne 


relationships between executives which may be related to any of these - 


variables. As a measure of communication within the company and of 
other personal relationships, the R, A, and D scales offer further possibili- 
ties. These measures might be obtained by a study in which an execu- 
tive’s seniors complete R and A forms for him and his juniors complete 
D forms for him. A comparison of these scores with the executive’s own 
forms would constitute a measure of the individual’s understanding of his 
responsibility and authority from the seniors who determine them and of 
his delegation of authority from the juniors to whom delegation is made. 


R, A, and D Relationships 


Table 2 includes correlation coefficients between R, A, and D and other, 


variables used in the study. These correlations are descriptive only of.the 


Table 2 


Product Moment Inter-correlations of R, A, and D Scores 
and R, A, and D Scores Correlated with Other Variables 


Variable (N = 24) R A D 
R (Responsibility) x 56 .29 
A (Authority) 56 x 54 
D (Delegation of authority) 29 54 x 
Time spent in supervision** —.06* —,25* —.12* 
Number of choices*** .29* _ 28" 48* 
Executive's salary .48* Al* .49* 
Executive’s echelon 34 40 14 


* The sign for this correlation has been changed so that in interpreting the correlations 
A large score in one variable is also indicative of a large score in or a greater degree of the 
second variable, 


fie ‘This variable was expressed in the per cent of the executive's total time which he 
estimated he spent in supervision. 
*** As determined from the sociometric diagram. 


relationships existing between the variables for this particular population of 
executives. They cannot be inierpreted as sampling statistics, since the 
group of executives studied here does not constitute a statistical sample. 

The inter-correlations between the three factors were .56 for R and A; 
29 for R and D; and .54 for A and D. In the studies of Naval leader- 
ship, unpublished correlations for a group of 40 Naval officers were found 
to be .56 for R and A; .16 for R and D; and .86 for A and D. These 
comparative correlations between the business executives and the Naval 
officers indicate the same general trend in the inter-correlations betwee? 
factors, although the Navy correlation of .86 for A and D was consider- 


| 
| 


ably higher than the executive correlation of .54 for the same factors. l 


Study of Executive Leadership in Business 525 


of this difference may have been due to the possibility that such 
pts as authority and the delegation of it are more clearly defined for 
y personnel than they are for business executives, and that they 
measured and weighed with greater absoluteness in the military 
onment. " 
The correlation between authority and time spent in supervision, the 
est of the three negative correlations between these variables, was 
25. This indicates that the executive who devoted a greater per- 
ge of his time to supervisory activities, as contrasted with such 
er activities as planning or coordination or evaluation, tended to have 
A score which represented a lesser amount of authority. 
Tn a later paper, the sociometric pattern which was used in the study 
‘be discussed. The “number of choices” variable was determined 
n the listing which each executive made of the men with whom he: 
it most time in getting his work done. In the sociometric diagram, 
greatest number of choices was received by three executives who 
in the second echelon. The relatively high positive correlation of 
between number of choices and D score indicates that those men who 
o consulted most and with whom most time was spent in getting work 
also tended to be the men who were delegating authority to the 
test degree. 
The correlations between R, A, and D scores and salary all indicate 
at executives with the higher salaries tended to have scores which 
cated a greater degree of the three factors. In view of the low cor- 
tion between echelon and D score, the relatively high correlation of 
between salary and D score may be surprising. However, this cor- 
tion is strongly influenced by the fact that several of the executives on 
fourth echelon were receiving higher salaries than some of the execu- 
s on the second and third echelon. 
The correlations between R, A, and D scores and echelon were not as 
as they were with salary. However, it is quite logical that the cor- 
ons with echelon were highest for responsibility and authority, since 
an be expected that the higher level executives would have a higher 
on these factors. Delegation of authority, on the other hand, is an 
idualized factor, not greatly related to the executive’s echelon. 


Summary 
: The R, A, and D scales introduced by Stogdill and Shartle in their 
ies of Naval leadership were applied to a group of 24 executives in a 
and rubber manufacturing company. The scores for each executive 
each of the three factors provide a measure of the individual’s evalu- 
n of his responsibility, authority, and delegation of authority. From 


526 C. G. Browne 


their scores, these executives estimated that their responsibility and 
authority were greater than their delegation of authority. 

Since the factors measured by the R, A, and D scales are particularly 
important at the executive level, a quantitative method such as presented 
here should aid in the analysis and understanding of executive functioning 
and business leadership. This is based upon the general hypothesis that 
leadership and executive activity are dependent upon social and working 
relationships in group activities, and that their study from this approach 
will prove more helpful than the analysis of individual characteristics 
with psychological trait testing methods has proven. 


Received March 14, 1949. 
References 


1, Browne, C. G. An exploration into the use of certain methods for the study of executive 
function in business. Unpublished Ph.D. dissertation, The Ohio State University, 
1948. 

2. Jenkins, W. O. A review of leadership studies with particular reference to military 
problems, Psychol. Bull., 1947, 44, 54-79. 

3. Stogdill, R. M. Personal factors associated with leadership—a survey of the litera- 
ture. J. Psychol., 1948, 25, 35-71. 

4, Stogdill, R. M., and Shartle, C. L. Methods for determining patterns of leadership 
behavior in relation to organization structure and objectives. J. appl. Psychol., 
1948, 32, 286-291. 

5. Thurstone, L. L., and Chave, E. J. The measurement of attitude. Chicago: Univ. of 
Chicago Press, 1929. 


Bernard M. Bass 
Louisiana State University 


i Several studies in recent years have reported the use of the leaderless 

up discussion situation as an aid in selecting candidates for positions 
nvolving leadership (1,2, 3,4). Little has been done to evaluate this 

shnique quantitatively or to investigate the possibility of making ob- 
lective measures of individuals in this situation. The purposes of the 

esent study were to investigate the extent of agreement among raters of 
sion participants and the relation between the total amount of time 
a participant spent talking! in the leaderless group discussion, and the 
ratings he obtained. 


Subjects, Method and Apparatus 


A class of 20 educational psychology students served as subjects. 
_ Twelve were men and 8 were women. ‘They ranged from freshmen to 
aduate student, with a median of 2 years college education. Several 
students had 2 or more years of teaching experience. 
A total of 6 leaderless group discussions was run in 6 weeks. The 20 
" subjects were divided randomly into Group A and Group B. Group A 
" Participated in the first discussion while Group B observed the partici- 
"pants. The two groups of 10 students each switched roles for the second 
cussion. Those 10 participants of the first 2 discussions who had been 
‘Given the highest leadership ratings for the discussions by their classmates 
_ formed the third leaderless group discussion. ‘The fourth discussion was 
_ Composed of the remaining 10 participants who had received the lowest 
dership ratings. The original groups, A and B, were used again for the 
"last 2 discussions. Group A participated in and Group B observed the 
th discussion, and Group B participated in and Group A observed the 
h discussion. 
Each of the discussions lasted 20 minutes and was held during class 
å in the classroom. The 10 participants were seated around the out- 

side of a V-shaped table. A code numbered place card was put in front 
: ea of the participants for identification. The 10 observers sat facing 

‘the participants on the other side of the room. To provide. adequate 
For a discussion of the use of time spent talking in the individual interview as a 
dictive measure, the reader may refer to Chapple, E. D. and Donald, G. A method 
Valuating supervisory personnel. Harv. Bus. Rev., 1946, 24, 197-214. 

527 


528 Bernard M. Bass 


motivation for the participants, each of the discussions were considered a 
course examination and grades from A to E were awarded by the experi- 
menter-instructor. 

Oral instructions were as follows: 


“You will be given a problem and will have thirty minutes in which to dis- 
cuss it. You will be graded, not only on how well you as an individual contrib- 
ute to the group discussion, but also on how well the group does as a whole. 

“Everyone may receive an A or everyone may receive an E depending on 
how much he contributes and how much the group progresses. Therefore, if 
you feel someone else is ‘off the track,’ is wasting the group’s time and therefore 
is lowering your grade, feel free to cut in and get the group back on its proper 
assignment,” 


The problem presented to a discussion group was pertinent to the 
material on educational psychology which the subjects had supposedly 
studied the night before. It called for the development of a program or 
set of plans which must be sold to another group, such as a school board, 
The problem was read aloud to the group twice and then they were told 
to begin discussing it. 

One example of the problems used is as follows: 


The School Board of your town has gone progressive. The Board realizes 
that teachers cannot do everything and are planning to obtain a staff of special- 
ists in various areas to cope with the several problems which teachers are unable 
to handle adequately. Consider yourselves as the chairmen of the ten depart- 
ments of your high school of 5,000 students. You are meeting thirty minutes 
before the School Board goes into session. The present high school personnel 
consists of teachers, the principal, an office staff, and a janitorial staff. Your 
problem is.to agree upon the four specialists you will ask for, and the reasons 
you will present for choosing those four. The School Board will only ap- 
propriate $12,000. And remember, there are 5,000 students, so don’t plan on 
overloading the four specialists, 


For the first 3 discussions, the experimenter used a stop-watch to clock 
the number of seconds each participant talked and then recorded the 
measurement under the participant’s name on a log sheet.? It was found 
difficult to keep up with the constant shift in speakers, especially during 
arguments. Interruptions and pauses within speeches also increased the 
difficulty of measuring and recording. Fortunately, the differences 
among participants were large enough to allow one to assume that the 
measurement errors were not of such an order as to warrant discarding 
the results of the first 3 discussions. 

In order to increase the time data precision, the Group Discussion 
Chronometer was designed by the investigator and introduced into the 
study in the fourth discussion. The GDC consisted of a panel of 10 

* Wire recordings were not used because of the difficulties of dubbing in the speakers’ 


code numbers which would have bee i identify who was speak- 
tag hb A givenatie’ n necessary in order to later identify who 


An Analysis of Leaderless Group Discussion 529 


itches spaced to allow each of the experimenter’s fingers to 
1 button without having to move his hands. Each push-button 
| the circuit of 1 of 10 sweep-hand, self-starting, electric clocks 
d on a board outside of the classroom and connected by a cable 
switch panel. One button and 1 clock were devoted to each par- 
d a cumulative measure, in seconds, was obtained of the total 
participant talked. The experimenter. pressed the button ap- 
to the participant each time the participant began talking and 
it when the speaker stopped or paused. Unlike the stop-watch, 
could record individual times even when two participants 
to talk at the same time. 

participants and observers rated participants by 3 different 
‘methods on 13items. The 3 rating methods were the ‘Spread N,” 
‘as used for all 6 discussions; the “3 Best minus 3 Worst,” which 
d for the third discussion, and the “rank order of merit,” which 
d for the fourth discussion. All ratings were recorded on a pre- 
ventory by the raters immediately after the conclusion of a 
ion. 
10 participants were to be rated, the “Spread N” technique? 
ollows: À 


gave 1 vote to each of the 10 participants if all participants were 
lequal. Any other distribution of the 10 votes was possible. 

n the rank order of merit method was used, the raters were in- 
to rank the 10 participants on each of the 13 items. For the 
minus 3 Worst” method, the raters were told to select the 3 
and 3 lowest participants on each of the items. 

e 13 items upon which appraisals were made were as follows: 

or the person or persons: 


m you think led the discussion. ake x 
m you think knew most about the topic discussed. , j y 

m you think most influenced the other participants in the discussion, 
most clearly defined the problems, who brou: ht them into sharp 
, and who best organized the group’s thinking, urine bee discussion. 
m you would select to be superintendent of schools if he (she or they) 
the proper experience and training. 

Whom you like. best. i 

ered the best solutions to the problems discussed. 


reader can recognize a similarity between this technique and the pooled judge- 
hod for differentially weighting traits of a composite criterion described in 
E. Principles of employment psychology. New York: Harper, 1942, 354. 


530 Bernard M. Bass 
8. Whom you would like to see as chairman or head of-the department. 
9. Who most motivated the others to participate in the discussion. 


10. Who seemed most interested in the discussion. 
11. Whose class you would like most to be in, if all the participants were 


ers. 
12, Whom you would select to address an audience of teachers. 
13. Who should get the best grade for today’s discussion. i 


For the “Spread N” rating procedure, the total number of votes re 
ceived from all those who rated a participant were divided by the number 
of raters to obtain his mean rating. To obtain an individual’s rating by 
means of the “3 Best minus 8 Worst” technique, the number of “worst” 
yotes received from all the raters were subtracted from the number of 
“best” votes. For the ranking procedure, a participant’s average rank 
was computed. As will be shown later, there seemed little value in com- 
puting mean ratings for each of the thirteen items, item by item, as they 
all seemed to be measuring the same variable—leadership status. 


Agreement Among Raters 


It was felt unprofitable to compute the 190 intercorrelations between 
raters because of the small number of ratees. Participants’ total “Spread 
N” ratings in the first, second, fifth, and sixth discussions assigned by & 
given rater, were converted instead into ranks, and the average intercor- 
relation of all rank orders was computed to give a rough appraisal of the 
extent of agreement among judges (See Woodworth (5), p. 372 ff.). The 
average intercorrelations obtained for the first and last two discussions 
were .72, .61, 63, and .41 respectively. 

‘The correlation between combined participant's ratings and combined 
observer's ratings of each participant was another measure of the extent 
of agreement among raters. When participants’ ratings for the first 2 


Table 1 


‘The Rate-Rerate Reliabilities of Nineteen Judges Rating Participants in 
Discussions I and II and Six Weeks Later in Discussions V and VI 


Judge r Judge r Judge r 
ee o D ia ia ADe 
4 06 8 65 15 63 
2 A 9 53 16 86 
3 #0 10 86 17 oA 
4 48 ll 69 18 Al 
5 79 12 72 19 53 
6 7% 13 6 20 72 
7 s 14 87 


| 


An Analysis of Leaderless Group Discussion 531 


were combined into 1 distribution, there was a correlation‘ of 
n ratings assigned to participants, by participants, and by 


Retest and Rerate Reliability 


judge’s ratings of participants of the first 2 discussions were cor- 
with the ratings assigned by the judges to the same individuals 
y acted as participants of the last 2 discussions 6 weeks later. 
discussion was among the same participants as the first, and the 
cussion was among the same participants as the second. ‘Table 1 
19 rate-rerate reliability coefficients obtained. By converting 
ients into Fisher’s Z-function, a mean coefficient of .72 was 
There was a correlation of .87 between the total time partici- 
ad in the first or second discussion, and the time they talked in 
or sixth discussion. 
0 subjects of the 20 who participated in the first 2 discussions 
sived the highest ratings were placed in the third discussion. The 
10 subjects were placed in the fourth discussion. Despite some re- 
those in the highest leadership status in discussions I or II became 
ders of leaders” in discussion III and the highest “followers” in 
ons I or IT became the “leaders of the followers” in discussion IV, 
rder correlations between participants’ ratings in discussions I or 
or IV were .78 and .76 respectively. ‘The conclusion may be 
that despite the change in’groups and restriction of range, leader- 
tended to be generalized from one leaderless group discussion 
. There seemed to be consistency in both the behavior of 
ts and the ratings they received in the leaderless group dis- 


Interrelationships Among Variables 


He number of votes assigned to participants of discussion I or II by 
Were combined, item by item, and correlated with the time par- 
} Spent talking. Table 2 shows the results obtained. When 
obtained on all 13 items were correlated with time spent talking, 

licient obtained was .93. Because of the high correlations ob- 
t each item and time spent talking, it was thought unneces- 
© compute the matrix of intercorrelations, as all seemed to be meas- 
Same factor, i.e. leadership status. The unidimensionality may 
due in part to halo effect. 
sion V and VI showed similar results. The correlation between 
ip status ratings was .86. Taking into account the 
te reliabilities mentioned previously, it seems that time 


otherwise stated, correlations were obtained by the Pearson product-moment 
t , 


532 ` Bernard M. Bass 


Table 2 


Correlations Between Time Spent Talking in the First Two Discussions and 
Number of Votes Received on Each of Thirteen Leadership Items 


Item r Item r . Item r 
1 91 6 85 11 86 
2 87 I 7 92 12 87 
3 -90 8 82 13 92 
4 89 9 90° All Items 
5 85 10 87 Combined 93 


spent talking and leadership status are closely associated and that this 
close association may be generalized from one leaderless group discussion 
to another. 

Comparison Among Rating Methods 


The effects of using a different rating technique were negligible. In 
discussion III there was an almost perfect linear relationship between 
“Spread N” ratings assigned by 20 raters, and the “3 Best minus 3 
Worst” ratings. In discussion TV, an almost perfect curvilinear relation- 
ship of the order y = a + b log X was found to exist between ranking and 
the “Spread N” ratings, but whereas the former tended to distribute 
individuals evenly, the “Spread N” like the “3 Best minus 3 Worst” 
method tended to dichotomize participants. The “Spread N” seemed to 
Scatter out the leaders more widely, while the “3 Best minus 3 Worst” 
best spread out the followers. From a realistic approach, on the basis of 
this extremely small sample, the “Spread N” would seem to be the most 
valuable as a technique for selecting from the top end of a distribution. 
Leaders were distributed by the “Spread N” in the same manner as on the 
variable of time spent talking. The correlation between leadership 
status and time spent talking was therefore highest with this rating 
method. Since the leaderless group discussion was designed primarily to 
discriminate among leaders, the “Spread N” best approaches the needs 
of the situation and the correlation obtained when using this rating 
method, does not appear to be spurious to any great extent. 


Interpretation and Conclusions 


Several hypotheses can be drawn for further investigation from the 
results obtained and from personal observations of the leaderless group 
discussion. 

1. If a group is given a verbal problem, with suitable motivation to 
cooperate and achieve the goals relevant to the problem, a differentiation 
of function will occur within the group. 


An Analysis of Leaderless Group Discussion 533 


less group discussion, one task may be assumed by sev- 
e tasks may be assumed by one; some tasks may not be 
all. These tasks include initiation or formulation of the 
d goals, organization of the group’s thinking, clarifying other 
responses, integrating responses of several individuals, 
motivating others to respond, accepting or rejecting other 
responses, outlining the discussion, summarizing, generaliz- 
ing the group’s agreement and formulating conclusions. 

ause of the verbal nature of the situation, the more tasks an 
T the more time he is forced to spend in talking to the 

‘oup 

is assumed that those individuals who carry out the above- 
sks are perceived by others to be the leaders of the group 


above hypotheses are correct, then the time an individual 
talking in the leaderless group discussion is indicative of his 
leader or follower in that group situation. 

11, 1949. 
Bei References 
W. Judging candidates by observing them in posuperyited group discussion, 
‘sonnel J., 1947 26, 170-173. 
M. Newstyne selection boards in industry. doain, Psychol., 1947, 11, 
-178. 
ssment of Staff, Assessment of men. New York: Rinehart, 1948. 
Use of the “group situation observation” method in the selection of trainee 


: perires. J. appl. Psychol., 1948, 32, 587-594. 
dworth, R. S. Experimental psychology. Now York: Holt, 1938. 


Performance on the File-Remmers Test, How Supervise? 
Before and After a Course in Psychology 


Harry W. Karn 
Carnegie Institute of Technology 


A recently developed instrument for the measurement of the attitudes 
and understandings necessary for supervisory success is the F'ile-Remmers 
questionnaire, How Supervise? The construction of the original forms 
of this test is described in an article by File (1). In a summary of the 
preliminary evidence on the validity of the device, File and Remmers (2) 
report significant increases in scores after supervisory training and sig- 
nificant differences in scores between successful supervisors and individ- 
uals by-passed because of lack of supervisory ability. 

An indication that the test is lacking in universal validity is the study 

by Sartain (4). This investigator reports the test to have little or no 
predictive value for success in supervision in an aircraft factory. In an 
attempt to reconcile these findings with their positive evidence, File and 
Remmers (2) question whether Sartain’s 40 supervisors in an expanded 
war industry are sufficiently typical to be used as cases for drawing gen- 
eral conclusions concerning the usefulness of tests in selecting super- 
visory personnel. 
: Tn view of the evidence reported to date, further studies appear to be 
in order before the question concerning the universal validity of the test 
can be settled. As a contribution toward this end, the following report 
is presented of an investigation designed to determine the effect of a 
psychology course upon college students’ understanding of supervisory 
skills and practices as measured by the File-Remmers test. 


Procedure 

' Forms A plus B of the test were administered under standard instruc- 
tions to 108 students (104 males) in the College of Engineering and 
Science at Carnegie Institute of Technology ‘during the first week of & 
first semester required course in psychology. About 98 per cent of this 
group consisted of students in their junior college year with the remainder 
consisting of irregulars in the sophomore and senior years. ‘These stu- 
dents made up the training group. The same tests were administered to 2 
comparable control group of 104 students (101 males) during the first 
week of a required course in English. During the last week of the 
semester both groups were again administered both forms of the test- 

534 


Performance on the File-Remmers Test 535 


‘During the interim between testings the specific issues covered in the 
were not discussed in either the psychology or English courses. 
res were not divulged nor the purpose of the investigation mentioned 
after the final testings. 
The training group consisted of five sections taught by four different 
tors. This group took a three-credit course in general psychology. 
of the instructors used standard psychological text books although the 
hasis throughout the course was not upon text book content per se 
pon the application of psychological principles to adjustment 
s, particularly those likely to be encountered in human relations 
ons in industry. To this end, realistic case problems were dealt 
and the student encouraged to solve them through the use of psy- 
ology and a systematic, analytical problem solving procedure. 
The control group was made up of five sections of a three-credit course 
English literature taught by five different instructors. 
A feature of the investigation is the use of a control group with which 
j compare changes in test performance by the training or experimental 
p. This type of experimental design is mandatory if valid conclu- 
s concerning the effectiveness of any training program are to be 
n. Changes in performance on the part of the experimental group 
be attributed to the training program only if these changes are sig- 
cantly different from any changes that may appear in the control 
4 Results 

“Table 1 summarizes the essential statistical data for a comparison of 
Í made by the training and control groups on Forms A plus B of the 
during the original and terminal testings. These data indicate a 
ht difference in mean raw scores between groups on the first test, an 


Table 1 


Comparison of Initial and Terminal 
Scores on Forms A plus B for 
Training and Control Groups 


Critical rimani and 
Mean 8.D. Ratio* rome Scores 


Before 95.3 15.0 


After 104.5 13.3 748 61 
Before 97.1 16.9 
After 98.9 17.9 1.70 81 


_* Critical ratio of difference between groups on first test, .81. Critical ratio of 
ce between groups on terminal test, 2.54. 


~ 


536 Harry W. Karn 


insignificant increase on the part of the control group on the retest and 
a highly significant increase of nine points of raw score on the retest by 
the training group. The nine point increase in mean raw score at this 
level is equivalent to a shift from about the 50th percentile to about the 
70th percentile, according to the File-Remmer norms for Higher Level 
Supervisors (3). A comparison between retest scores of the training and 
control groups shows a difference of about five and a half points in mean 
score which is significant at nearly the one per cent level of confidence. 
This comparison, however, obscures the total gain made by the training 
group since this group had a lower mean (95.3) than the control group 
(97.1) at the time of the initial testing. The full extent of the gain made 
by the training group is evident in the ensuing comparative analysis which 
treats the data in terms of absolute differences in scores between testings. 

A summary of the analysis made in terms of the differences between 
scores on the initial and terminal testings for training and control groups 
is presented in Table 2. These data reveal the average difference be- 
tween the initial scores and the higher terminal scores for the training 
group to be significantly greater than a comparable measure for the con- 
trol group. 

Table 2 


Comparison of Training and Control Groups in Terms of 
Differences Between Initial and Terminal Testings 


Training Control 
N R 108 104 
Mean of Differences 9.3 (gain) 1.6 (gain) 
8.D. 12.8 10.9 
Critical Ratio 4.73 


Because of greater reliability, both forms of the test are recommended 
by its designers over the single form. It is reported, however, that the 
single form yields scores which are sufficiently reliable for gaining informa- 
tion about a group as a whole. It appeared worthwhile, therefore, to 
compare the double form analysis from the present study with the data 
from single forms. Scores on Form B from the initial testings were com- 
pared with scores made on the same form on the terminal testings. Form 
B was used becauseany possible practice effects from having taken an- 
other form of the test would be equated on both initial and terminal 
testings. 

‘ Table 3 is a summary of the data from the single form analysis. There 
is little difference between mean scores of the two groups on the initial 
test, an insignificant increase by the control group on the retest, and a 


Performance on the Ist inners Test 537 
Table 3 


o ARIEL pan 
Rei 
Mean S.D. Ratio* Terminal Scere 


Before 50.3 8.0 

After 54,1 7.5 6.61 61 
104 Before 50.8 10.3 

After 51.5 10.6 85 68 


al ratio of difference between groups on first test, .43, Critical ratio of differ- 
etween groups on terminal test, 2.04, N 


r significant increase of four points of raw score on the retest by the 
g group. -The increase of four points at this level is equivalent to a 
rom the 55th percentile to the 70th percentile, according to the 
mmers norms (3). The difference of nearly three points in mean 
tween groups on the retest is significant at about the five per cent 
l of confidence. 

The full extent of the gains made by the training group on the single 
form is revealed in the summary of the data presented in Table 4. 
data show the average difference between the initial and terminal 
for the training group to be greater than the comparable measure 
ontrol group at about a one per cent significance level. 


Table 4 


Single Form B Comparison of Training and 
Control Groups in Terms of Differences 
Between Initial and Terminal Testings 


Training Control 
Bee | NN 108 104 $ 
Mean of Differences 4.1 (gain) 1.4 (gain) 
. 8D. 75 TD 
Critical Ratio 2.56 


general, the results of the single form analysis corroborate those 
analysis of data from both forms although the differences for the 

attain a higher level of statistical significance than those for the 

i Discussion ; 

the assumption that a psychology course designed to improve 

ta nding of the principles of human behavior in industrial situations 

accomplishes this goal, the superior terminal questionnaire per- . 


538 Harry W. Karn 


formance of the students having taken this course would indicate that the 
instrument is measuring those skills having to do with an understanding 
of the principles of the successful management of human relationships 
In support of this conclusion is the absence of significant improvement in 
scores on the part of a comparable group of students tested at the same 
times but without having taken the psychology course between testings. 
Since good supervision presumably requires a knowledge of human rela- 
tions skills, the present study can be interpreted as evidence for the 
validity of the File-Remmers test as a means of measuring one aspect of 
supervisory success. 

The present investigation is a contribution towards the establishment 
of the universal validity of the test since it deals for the first time with * 
college students under academic instruction. Previous studies, which 
have shown positive gains with training, have been concerned with the 
effects of specific industrial training programs among on-the-job super- 
visory personnel. The demonstration of improved scores under a variety 
of training conditions indicates that the responses are the result of the 
application of basic principles to the problems rather than the acquisition 
of specific answers to the questionnaire items. 

High scores on a test of the type under discussion are no guarantee 
that the individuals making such scores will be good supervisors. There 
must be additional evidence from other sources that such individuals will 
put into practice the knowledge they possess. The argument for the use 
of the test is still good, however, for obviously individuals cannot put into 
practice knowledge that they do not have. The final answer to the 
question of measuring and predicting success in supervisory situations 
will probably take the form of a composite index based on test scores, bi- 
ographical data and on-the-job ratings by qualified observers. 


Summary 

A group of 108 college students in their junior year were administered 
Forms A plus B of the File-Remmers questionnaire, How Supervise? 
before and after a course in psychology. The same tests were adminis- 
tered to a comparable control group of 104 students before and after & 
course in English literature. 
___ An analysis of the data in terms of mean scores for both groups 02 
initial and terminal testings and in terms of differences between scores 00 
the two testings shows significant gains in favor of the psychology group: 
This is true for both the double form data, i.e., the comparison of scores 
on F orms A plus B on initial and terminal tests and the single form com- 
parison of scores on Form B only. 


. Received March 7, 1949. 


Performance on the File-Remmers Test 539 


References 

y. W. ‘The measurement of supervisory quality in industry, J. appl. Psychol., 
is 29, 323-337. 
Q. W., and Remmers, H. H. Studies in supervisory evaluation. J. appl. 
paul, 1946, 30, 421-425, 

Q. W., and Remmers, H. H. How Supervise? (revised manual) New York: 
1948, Psychological Corporation, pp. 8. 

n, À. Q. Relation between scores on certain standard tests and supervisory 
" success in an aircraft factory. J. appl. Psychol., 1946, 30, 328-339, 


The Prediction of Accidents of Taxicab Drivers 


Edwin E. Ghiselli and Clarence W. Brown 
University of California 


The long history of investigations concerned with the effectiveness of 
tests in the prediction of accident proneness among vehicle . operators 
might lead one to suspect that a wealth of information exists on this sub- 
ject. Examination of typical reviews of the literature, however, points 
up the fact that empirical evaluations are by no means extensive and are 
quite restricted with respect to types of tests, being almost wholly con- 
cerned with apparatus tests (2,5, 7). There appears to be general agree- 
ment that several kinds of reaction time tests, particularly those involving 
More complex functions, are the most useful. The value of tests of 
Sensory acuity and perception is less certain. In the paper and pencil 
test field only intelligence tests have had more than a cursory examination. 
The validity of this type of predictor is rather low, with a validity coeffi- 
cient of the order of about .15. 

Tn many situations apparatus tests cannot be employed in the selection 
of vehicle operators. Both initial and maintenance costs often are too 
high. In certain cases applicants must be tested in large groups, and in 
others personnel capable of operating apparatus tests are not available. 
Indeed, in some instances sheer lack of space for a permanent set up of 
testing equipment is an obstacle. However one might view the desirabil- 
ity of apparatus tests as compared with the paper and pencil variety, 
financial, administrative, and physical limitations may be the deciding 
factor in the choice of the type of tests that can be utilized. This leads, 
then, to a need for a closer scrutiny of the data available concerning the 
effectiveness of paper and pencil tests. 

The low validity of intelligence tests cited earlier can be considered to 
assume some significance if only the very few superior individuals in 4 
large number of applicants are to be considered. The Personnel Research 
Section of the Adjutant General's Office have reported eight validity c% 
efficients for intelligence tests and nine for mechanical principles tests 
relative to performance on various types of road tests (6). In the large 
majority of these studies the coefficients were based on more than one 
hundred cases. For intelligence tests the validity coefficients range from 
.03 to .33, with a median of -18, and for mechanical principles tests the 
range of validity coefficients is from .00 to .40, with a median of .20- 


540 


Prediction of Accidents of Taxicab Drivers 541 
driving skill might be expected to be related to safety of operation 


efficients at least are suggestive that paper and pencil tests might 
Ipful in predicting accidents. The present authors, together with 
, Minium, investigated the usefulness for street car motormen of a 
‘of tests, including some of the paper and pencil variety (4). The 
on in this study was accidents for an eight-month period. Paper 
pencil tapping and dotting tests were found to be the most valid, 
ng even apparatus tests (sensory acuity, distance perception, and 

le reaction time). A coefficient of the order of about .35 is descriptive 
tapping and dotting tests. Tests of mechanical principles, judg- 
of distance by perspective, and judgment of linear distances were 
to be less useful. For them a validity coefficient of .15 is repre- 


und 
og 


research. The results of a single investigation should not be 
dered definitive but certainly would be a helpful addition to knowl-. 
in such a relatively unexplored area. It was with this intent that 


Description of Predictors 


en sht different speed tests and an interest inventory which yielded 
ur different scores were used. The first five of these tests were the paper 
encil tests employed by Ghiselli, Brown, and Minium with street car 
nen (4). Following is a description of each predictor. 


Potting. This test consists of a series of circles one-eighth inch in diameter, 
ted by lines, and irregularly spaced. ‘The subject is instructed to place 
lot in each circle. No pretest practice is given and the time limit is one- 
ninute, A j p i 
Tapping. In this test the subject is presented with a series of circles of 
me-ha eh in diameter and is inanod to put three dots in each circle, 
pretest practice is permitted and the time limit is one-half De 
ent of Distance. The intent of this test is to measure MARTS 
neces between objects utilizing as cues only perspective and Sty =- 
Each item is a schematic representation of a table top on whic . ore 
four equally sized cubes. The positions of the cubes and the ang Z o 
different in the different items. The task is to judge rag a Las 
earest to a designated key cube. The time allowed for this is 
; nutes, p i 
stance Discrimination. Each item in this test consists of a square in 
ere are three test points and a reference point. The task n w asana 
the test points is closest to the reference point. Various distracting 
among the points, The time limit is three and one-half ipa anita 
ical Principles. This test consists of a series of pictorially presen 
lustrative of various mechanical principles. ost of the items are 


542 Edwin E. Ghiselli and Clarence W. Brown 


concerned with the movement of vehicles and the operation of levers. Only 
simple mechanical principles ‘are involved. The time for this test is eight 
minutes. 

Numerical Problems. This is an arithmetic test involving the making of 
change and the computation of fares when various rates and lengths of trip are 
given. Five and one-half minutes are allowed for this test. 

Speed of Reactions I. This test is an attempt to put in paper and pencil 
form a complex reaction test such as the Viteles Motorman Test. Each item 
consists of a square in which different letters appear in various spatial arrange- 
ments. Depending on the specific letters given in an item and their spatial 
arrangement, the subject makes a mark in one of five spatially differentiated 
circles placed below the square. This test is preceded by detailed instructions 
concerning the rules together with some examples for practice. On each page 
of the test proper the rules are given so the subject has them immediately avail- 
able for reference. The time for this tests is four and one-half minutes. 

Speed of Reaction II. This test is the same as Speed of Reactions I except 
that the subject can never refer to the rules once the test is begun. Four 
minutes are allowed. 

Interest. The interest inventory has no time limit. It yields four separate 
scores and a total score which is simply the sum of the part scores. For each 
of the four scales there are 24 items. In each item the subject chooses between 
two different occupations. He is instructed to choose on the basis of interest 
and ignore such matters as pay, vacations, and the like. The first scale has to 
do with Occupational Level, and compares a job roughly at the semiskilled level 
. with a higher level job. The correct answer is the lower level job. The second 
scale is concerned with Outside Occupations, the correct choice being a job which 
is done out of doors rather than indoors. The third scale attempts to measure 
interest in occupations which require dealing with the public, and is termed 
Dealing with People. The last scale, Related Occupations, compares jobs in- 
volving the operation of vehicles with other types of jobs. 


In addition to these tests certain personal data concerning the drivers 
were available. This information consisted of age, years of formal edu- 
cation, years of previous experience driving taxicabs, and years of experi- 
ence operating other types of vehicles, either commercially or in the armed 
Services. Since frequently considerable reliance is put on these variables 
in hiring drivers they, too, were studied in relation to accidents. 


Subjects 


The subjects used in this investigation were 67 men who applied for 
work and were employed as drivers by a taxicab company during the same 
three-month period. All men took the tests prior to being hired, and to 
some extent their scores were taken into account in the decision regarding 
their employment. In the selection process a profile of test and inventory 
Scores was plotted for each person and those individuals who showed 
marked deficiencies were rejected. Greatest emphasis was given to the 
numerical problems and speed of reactions tests and to the interest in- ` 
ventory scores. Approximately one out of five applicants were rejected 
on this basis. With the exception of the speed of reactions test all men 
took all tests. For the speed of reactions tests the number of cases is 57- 


Prediction of Accidents of Taxicab Drivers 543 


Criterion 
ecidents are a notoriously unreliable index of human behavior (5). 
thermore, accident proneness is by no means a unique trait. Exam- 
ion of the relationships among different types of accidents incurred by 
tors of public conveyances indicates that there is considerable 
jecificity (1). The problem of setting up a criterion to measure safety of 
ormance, then, is fraught with many difficulties. In the present in- 
ition the situation was complicated by the fact that only the safety 
or the first five weeks of employment were available. However, 
period is particularly critical since it is during this time that super- 
8’ judgments concerning the drivers’ skill are crystallized. Of the 
vers, 48 incurred no accidents during the first five weeks of employ- 
t, 17 were involved in one accident, and two men were involved in 


men were divided into two groups, the accident free men (48 cases) 
1 the accident group (19 cases), and biserial coefficients of correlation 
utilized as indices of validity. 


Results 

Th Table 1 are given the validity coefficients for the various predictors 
hg accidents of the drivers during their first five weeks of employment 
the criterion. It is apparent from this table that with the exception 
‘the dotting and tapping tests, and possibly the interest inventory, none 
the validity coefficients can be considered particularly significant. 

The validity coefficients of the first five tests, dotting, tapping, judg- 
nt of distance, distance discrimination, and mechanical principles, are 
roximately of the order found earlier for motormen (4). The pre- 


Table 1 
Validity Coefficients for Various Predictors in Relation to Accidents 


Validity 


Predictor Coefficient Predictor 
Dotting 23 Occupational Level (Interest) 
Tapping .23 Outside Occupations (Interest) 
Judgment of Distance .11 Dealing with People (Interest) 
Distance Discrimination .20 Related Occupations (Interest) 
Mechanical Principl —.08 Age ? 
a Bread Eeen, —.10 Years of Formal Education 
Speed of Reactions I .00 Years of Experience Driving 
Speed of Reactions II Taxicabs 4 
Total Interest Inven- .07 Years of Commercial and 


tory Service Driving 


544 Edwin E. Ghiselli and Clarence W. Brown 


dictive power of the first two tests is fairly substantial while that of the 
latter three is low. On the basis of the motorman study weights were 
developed for these five tests for computation of a battery score. The 
effective weights are as follows: dotting, and tapping each four, judgment 
of distance and distance discrimination each two, and mechanical princi- 
ples one. Applying these effective weights to the scores of the 67 taxicab 
drivers the validity coefficient of the battery was found to be .69. None 
of the accident group fell in the upper 25% of scores. Undoubtedly this 
coefficient is fortuitously high and would not be obtained with another 
similar sample. For the motormen the validity of this battery was of the 
order of .35. A battery of this type certainly seems worthy of further 
study for the selection of operators of vehicles. It is apparent that the 
tapping and dotting items are the most important components of the 
battery. Remembering that the total testing time for these two tests is 
only one minute the validity coefficients are surprisingly high. They are, 
in fact, too high to be readily acceptable just on the basis of two investiga- 
tions, and further evaluation of these tests is indicated. 

The fact that scores on the numerical problems test were found to be 
unrelated to accidents is not unexpected. There is no reason to suppose 
that ability to solve arithmetical problems is related to capacity to drive 
a motor vehicle safely. However, the complete lack of validity on the 
part of the speed of reactions tests certainly was not anticipated. At 
least, superficially, these tests seem to measure very nearly the same 
abilities as those measured by complex reaction tests, such as the Viteles 
Motorman Test, which have considerable predictive power. Perhaps 
these findings may be taken to indicate that certain kinds of abilities 
measured by apparatus tests and important in safe performance cannot 
be measured by paper and pencil prototypes. 

} ‘The interest inventory that was utilized was designed principally to 
indicate which applicants would tend to stay on the job as taxicab drivers 
and which would tend to leave. In the area where this study was made 
labor turnover in the taxicab industry was quite high. An analysis of 
the situation indicated that a large proportion of the persons who left 
employment did so because they found that they did not like the nature 
of the work or at least they preferred other types of jobs. Nevertheless, 
the validity coefficient of .28 is suggestive of the value of interest measures 
in the prediction of accidents. The particular inventory utilized in this 
investigation undoubtedly can be greatly improved and the fact that 
three of the scales (occupational level, outside occupations, and relate 
occupations) yielded validity coefficients of .20 cannot be ignored. 

In the hiring of workers considerable emphasis is given to the factors 
of age, education, and work history. The subjects in the present 0- 
vestigation differed markedly in these variables. The range in e20 


Prediction of Accidents of Taxicab Drivers 545 


le is as follows: for age, 21 to 53 years, for education, seventh grade 
e graduate, for previous taxicab experience none to 6 years, and 
types of experience driving vehicles other than private cars none to 
years. In spite of the wide range of individual differences none of 
actors was found to be appreciably related to safety of operation. 
jew of the success with which personal data has been used in the 
ion of employees in other occupations the relationships here may 
be subject to some doubt. It is quite possible that the relationships 
e curvilinear and thus would not be adequately measured{by biserial 
cients. The numbers of cases in the study was considered too small 
yarrant any detailed study of curvilinear relationships. However, 
ection of the tabulations suggested an optimal age of 30 years and an 

al educational level of 10 grades for safe performance on the job. 


d Discussion 
( 


i ‘Taken together with the findings of previous investigations, the results 
study indicate that accidents can be predicted by paper and pencil 
No brief can be made for the position that such tests invariably 
be effective, but at least it can be said that they can be as effective as 
ratus tests. It is likely, of course, that a combination of paper and 
neil and apparatus tests would give better results than either the one 
the other type alone. However, it may be questioned whether for 
oses of employee selection the inclusion of apparatus tests increases 
goodness of prediction sufficient to warrant the increased costs. It is 
that apparatus tests have considerable face validity and thus are 
ningful to applicants. Nevertheless, paper and pencil tests of the 
used in the present investigation, while novel to the large bulk of 
cants, were readily accepted by them. By means of proper in- 
Structions and judicious choice of items paper and pencil tests, too, can 
made with satisfactory face validity. $ 

The battery consisting of the dotting, tapping, judgment of distance, 
tance discrimination, and mechanical principles tests seems most promis- 
as a selective device for the selection of safe vehicle operators. On 
different groups of operators it has given acceptable results. One 
ty with the battery is that most weight is assigned to two very 
tests. It is almost inconceivable that the dotting and tapping 
with their half-minute times could have reliability coefficients 
sher than .60, and, indeed, a lower figure would seem more reasonable. 
is Possible, of course, that they do measure abilities of particular im- 
ttance to safe operation, but certainly more evidence than two studies 
‘ing only a total of some 220 cases is necessary before any great con- 
ce can be placed in them. 


546 Edwin E. Ghiselli and Clarence W. Brown 


Another difficulty with the battery utilized here, and in fact, with any 
paper and pencil tests, is their relative uselessness for training purposes. 
Scores on visual and reaction time tests are most helpful aids in safety 
training programs. Having estimates of an individual’s visual acuity 
glare sensitivity, reaction time, ete., intelligent recommendations can be 
made and pertinent training initiated to compensate for any deficiencies. 
With motor vehicle operators Fletcher has been able to produce significant 
and long term improvements in driving safety by utilizing such tests and 
interpreting the significance of the scores to the individuals concerned 
(3). Probably the best that paper and pencil tests can do for training 
is to indicate which individuals should be given special attention in any 
safety training programs. J 

Summary 


The scores earned by 67 taxicab drivers on eight paper and pencil 
tests and an interest inventory, together with certain personal data items, 
were studied in relation to safety of operation. Accidents during the first 
five weeks of employment formed the criterion in the computation of 
validity coefficients. Dotting and tapping tests were found to have the 
highest validity. Tests involving judgment of distances and of knowledge 
of simple mechanical principles yielded low validity coefficients. A com- 
bination of the scores on the five foregoing tests, weighted on the basis of 
evidence collected in another investigation with an entirely different 
group, yielded a battery which was found to have a validity coefficient of 
-59 for accidents. An arithmetic test and a paper and pencil test of com- 
plex reactions were found to be useless in predicting accidents. Interest 
measures showed some promise, particularly for scales of occupational 
level, outside occupations and related occupations. No significant rela- 
tionships were found between the accident criterion and age, education, 
and previous driving experience. 

Received April 1, 1949. 

References 

1. Brown, C. W., and Ghiselli, E. E. Accident proneness among streetcar motormen 
and motor coach operators. J. appl. Psychol., 1948, 32, 20-23. 

2. DeSilva, H. R. Why we have automobile accidents. Wiley, 1942. 

3. Fletcher, E. D. Capacity of special tests to measure driving ability. Mimeo- 
graphed, undated. 

4. Ghiselli, E. E., Brown, C. W., and Minium, E. W. The use of test scores for the pre- 
diction of accidents of street car motormen. Report to the Municipal Railway 
System of San Francisco, 1946, 


5, Ghiselli, E. E., and Brown, C. W. Personnel and industrial psychology. New York: 
McGraw-Hill, 1948, bee 


6. Personnel Research Section, Adjutant General’s Office. Statistical Manual. 1944. 
7. Viteles, M. S. Industrial psychology. New York: Norton, 1932. 


Some Precautions in the Use of the Per Cent 
Method of Job Evaluation 


William D. Turner 
University of Pennsylvania 


_ Section 5 of the computation for establishing factor comparison scales 
y the per cent method of job evaluation (3) involves a table of F%/J% 
s (or estimates of relative job totals, see 4, p./156F) the columns of 
ch have different totals. It may be assumed that such differences 
veen column totals follow principally from fortuitous variation of 
and inter-job factor patterns, since judgment errors contributing to 
differences will tend to be cancelled out in each column total. The 
dividual ratios in these columns are quite fallible estimates of'relative _ 
D totals, because each ratio is based ultimately on two independent and 
le per cent ratings. An improved estimate of job totals could be 
if there existed several such estimates for each given job, which could 
be averaged with the expectation of cancelling out material portions 
e judgment errors inhering ineach. One’s first impulse is to average 
ratios in each given row of Section 5. But the aforementioned differ- 
between the totals of the columns of these ratios indicate that each 
n is of a different order of magnitude, and that an estimate in one 
imn is therefore not strictly comparable with one in the same row but 
er column. Accordingly, in Section 5 of the computation, four of 
columns are multiplied by “Reduction Constants” so that their 
ive totals become equal to that of the remaining (M) column whose 
is arbitrarily taken as a base. „The results of such multiplications 
ar in Section 8 of the computation. The rows in Section 8 are then 
mmed (which amounts to averaging, since each row contains the same 
mber of ratios) to yield the column of “(8) Totals” appended to 
ion 4, These ‘(8) Totals” represent the improved estimates of job 
sought above. 
Likewise, improved estimates of factor totals are obtained by apply- 
“Reduction Constants” to the rows of Section 6, (J%/F% ratios), so 
the resulting columns in Section 7 contain relative factor totals of 
parable orders of magnitude, which are in turn summed to obtain 
oved estimates of such totals. See the “Total” row in Section 7, 
contents are subsequently “converted” to adjust their general 
| of Magnitude to that of the “(8) Totals” discussed above, before 


g appended to Section 3. 
547 


548 William D. Turner 


Consequences of Eliminating the “Reduction” Process 


Hay (2) essentially proposes to eliminate the “Reduction” operation 
in question, and, hence, to average incomparable relative figures. If one 
had valid reason to weight differently, say, the columns in Section 5, one 
could quite properly depart from the writer’s procedure. But there is no 
known rational basis for such differential weighting, and Hay’s procedure 
accepts an irrational and adventitious weighting for which there is no 
theoretical defense. 

Whether Hay’s method can be justified in practice depends then on 
the magnitude by which his results depart from those yielded by the 
writer’s method, and on the relative value of committee time lost in cor- 
recting the larger, ultimately detectable errors introduced by Hay’s 
method. 

When Hay’s abbreviated procedure is applied to the data in Section 
5 of the writer’s article (3), and when the general level of magnitude of the 
resulting relative job and factor totals is adjusted to that of such totals 
in Sections 3 and 4 of the method, factor ratings by F% and J% com- 
parable with those in Section 11 may be computed. When this is done, 
and when such totals and ratings are compared with those obtained by 
the complete computation, 45% of the 137 percentage differences be- 
tween the results by the two methods are greater than zero. 28% of 
these percentage differences exceed 3; 21% exceed 4; 20% exceed 5; and 

13% exceed 6. Three each of these differences exceed 7 and 8; and one 
each exceeds 9, 10, 11, and 12. 

The significance of the foregoing percentage differences depends 
particularly on the frequency and magnitude of the larger ones relative to 
a job rating committee’s judgment error. So far as the writer is aware, 
no one has determined the exact value of such an error for a given com- 
mittee. However, the committee whose results illustrate the writer's 
article showed an average 5% pertentage difference between its original 
and its reviewed ratings on about 300 jobs, when the review in question 
followed soon after a leveling off of the committee’s skill. Percentage 
differences between the results of a subsequent review and the one 
mentioned can safely be estimated to lie between 2% and 4%. Assum- 
ing a 3% error as a compromise figure, about a third of the ratings which 
Hay’s procedure would have supplied to this committee would have de- 
parted from the ratings produced by the original computation by amounts 
equalling or exceeding the committee's own emerging average judgment 
error, and several of them would have differed from the latter ratings by 
as much as three or four times this error. 

Again, the foregoing discussion obviously assumes that ratings PTO 
duced by the original computation approximate true rating values more 


Precautions in Use of Per Cent Method of Job Evaluation 549 


than do those produced by Hay’s computation, and that the size 
frequency of discrepancies between the ratings produced by the two 
ods signify the degree of fault in ratings produced by Hay’s method. 
assumption is made because, as noted above, Hay’s computation 
ves the operation of averaging relative figures of different orders of 
gnitude. Such an operation augments the inevitable errors (of judg- 
already present in the ratings produced by the original computa- 
al procedure. The rational “Reduction” process in Sections 5 and 6 
the original procedure minimizes this difficulty. 
Statisticians will question why an average percentage discrepancy of 
'% between the results by the two computational methods is important 
the committee’s average judgment error itself is probably as great as 
Since a job evaluation committee is not so concerned with achiev- 
ig a small average error characteristic of its ratings in the aggregate as it 
ith minimizing its error in each particular rating it makes, those 
errors actually introduced by Hay’s method, and which the com- 
ittee can come ultimately to detect, become important. There are 18 
discrepancies in the illustrative case, which are at least twice as 
t as the committee’s average judgment error, and 4 of which are at 
st three times as great as this error. Only a few such errors in the 
nal scales are all that are needed to throw many subsequent ratings 
of line before the committee can become able to recognize their in- 
ectness, and the committee would then need considerable time to 
e the necessary corrections. Such committee time would be con- 
erably more expensive than the hour or less gained by the computer 
Hay’s abbreviated procedure; and unless the discrepancies in 
estion are ironed out, errors and borderline cases in job grading will be 


On the basis of the foregoing findings and conclusions, the writer sees 
‘to warn against the use of Hay’s abbreviated procedure, and to recom- 


Cases, 


Mathematical Relations Underlying the Per Cent Method 


should be observed that Hay’s proofs that row totals correspond 
h job totals, and that reciprocals of column totals correspond with 
or totals, obscure their underlying assumption that relative values of 
orders of magnitude can be validly averaged. The writer's 
findings, and the marked differences between some of the “Re- 
tion Constants” which appear in Sections 5 and 6 of the writer's 
(3), emphasize the practical untenability of such an assumption 
of the proofs which Hay bases on it. 


550 William D. Turner 


In the writer’s second article (4) the attempt was made to show some 
of the meaning of the per cent method’s complete computations, without 
recourse to algebraic notation. An algebraic version of pages 155 and 156 
of this article follows: 


Let 
r = any one of a number of factor ratings obtained during the use of 
established factor rating scales; 
Ty = the total of factor ratings (r’s) for any job; 
Tp = the total of factor ratings (r’s) for any one factor for any given 
group of jobs for which 7’y’s are also available; 
J =r/T; = “J value” (see Table II, p. 155, Reference 4) correspond- 
ing to a given r; and 
F =r/Tp = an “F value” (see Table III, p. 155, Reference 4) cor- 
responding to the same r. 

Then, F/J = (r/Tr)/(r/T3) = (r/Tr) (T4/r) = T3/Tr. 

But for any given factor grouping of r’s, Ty would be constant, and 
may be regarded as equal to unity. Therefore, the relative job totals, or 
T's = F/J. 

By corresponding proof, the relative factor totals, or T’r = J/F. 


An algebraic account of the more complex situation that arises when 
fallible F% and J% values are substituted for mathematically infallible 
F and J values, would be essentially descriptive. Since the algebraic 
expressions involved would be even less easily followed by most readers 
than is the verbal account in the writer’s second article, the writer has not 
published such an algebraic description. 


Weber’s Law and Job Evaluation 


Hay implies (1, 2) that Weber’s Law applies to factor comparison job 
evaluation. In brief, Weber’s Law says that a discriminable difference 
between two physical stimuli bears a constant ratio to the level of 
magnitude of the stimuli themselves. The absence of any physical of 
objective measure of job values makes an application of Weber’s Law to 
job evaluation data logically impossible. Hay refers to unpublished 
evidence of his own, to the effect that discriminable differences express? 
in rating scale units bear constant ratios to the magnitude of ratings 1 
factor comparison job evaluation, but apparently fails to consider the 
subjective nature of the scale units in which such differences and thelt 
corresponding ratings are necessarily expressed. Hence, the law which 
can properly express Hay’s observations is not Weber’s psychophys! 
Law, but a psychological Law of Per Cent Judgment which may P? 


Precautions in Use of Per Cent Method of Job Evaluation 551 
, L/J = K, in which 


[ = any discriminable difference (limen) expressed in per cent rating ` 
_ Seale units or their equivalent; 1 

= the point level of the scale rating characterized by L; and 

‘an empirical constant. 


is evident that this Law of Per Cent J udgment involves no physical 
t objective measures, and that any similarity which it bears to Weber’s 
essentially mathematical; it is a psychological and not a psycho- 
cal law. This latter distinction, which may seem to be purely the- 
ical, has an important practical implication for job evaluation, An 
ption that Weber’s Law applies to job evaluation data implies quite 
tly that job values can be ascertained by some physical or ob- 
e method of measurement. Such an assumption would lend un- 
Support to one of job evaluation’s recurrent delusions which 
lolds that “somewhere there exists the ‘real objective truth’ about job 
values.” The Law of Per Cent Judgment emphasizes the inescapably 
ve nature of the process of job evaluation. 
y's proposed “limen” of 15% is actually determined by the Law 
er Cent Judgment formula given above, with L equalling the scalar 
ance from the committee’s corresponding final ratings which includes 
st 75% of committee members’ individual preliminary ratings; 
J equalling the former ratings; and with the K value multiplied by 
to yield a percentage figure. However, the members of one of the 
ing committees whose work the writer has directed manifested in the 
ate a corresponding “limen” ratio equal to 9%, with corresponding 
ens” for individual members lying between 6% and 11%. In order 
4 committee’s judgments may express more fully the accuracy of 
hich the committee and its members are capable, the writer recommends 
igain (cf. 3) that the size of geometric steps on factor rating scales be 
ensurate with a committee’s emerging judgment accuracy rather 
With a prescribed standard “limen” of 15%, 


March 31, 1949. 


term, “their equivalent” is meant to signify factor rating scales derived by the 
method, or “a scales as derived by Benge’s method. The latter equivalent 

by the extremely close correspondence (apparently limited only by judgment 
between ratings by the per cent method and Benge’s method. Calling the present 
Per cent judgment relates it directly to the more controllable process of (per 
ent rather than to the salaries of equitably paid key jobs from which Benge 
his scales. The generally linear correlation between either per cent method or 

hod ratings and non-negotiated rates of pay, and the much closer agreement 
tings by the two methods in question, indicate that Pay rate setting in the 
job evaluation rests on a less accurate form of per cent judgment. 


552 William D. Turner 


References 


1. Hay, Edward N. Characteristics of factor comparison job evaluation. Personnel, 
1946, 22, 370-375. 

2. Hay, Edward N. Creating factor comparison key scales by the per cent method. 
J. appl. Psychol., 1948, 32, 456-464. 

3. Turner, William D. The per cent method of job evaluation. Personnel, 1948, 24, 
476-492. 

4. Turner, William D. The mathematical basis of the per cent method of job evalu- 
ation. Personnel, 1948, 25, 154-160. 


Predicting Subject Grades of Liberal Arts Freshmen 
with the Kuder Preference Record * 


Dorothy Terry Hake and C. H. Ruedisili 
University of Wisconsin 


_ The solution to the problem of the prediction of achievement is still 
ina preliminary stage. Interests are generally conceded to be important 
‘in determining success in any field of endeavor, but they are only one of 

“many factors, such as intelligence, attitudes, and personality traits. The 
_ present study deals with the relationships between college achievement in 
“Specific subject areas and interest test scores. We have investigated the 
value of the Kuder Preference Record in predicting the first-semester 
_ grades made by freshmen at the University of Wisconsin. 

Since the Preference Record was developed to measure interests, a 
igh correlation with specific achievements would not necessarily be 
_ expected, but a low positive correlation with general college achievement 
"has been found in previous studies. Crosby (2) has found positive cor- 
relations in the high .60’s between the Preference Record Scientific scale 
and chemistry and biology grades, and between the Computational scale 
and accounting grades. His subjects were students scoring above the 
_ 90th or below the 10th percentile in each of the Preference Record scales, 
_ and therefore the relationship between interests and grades is not as high 
lit appears at first glance. If the total distribution of interest scores 
had been used, the correlations would have been considerably lower (6). 

Thompson (9) reports some success in predicting dental school success 
J by using the Preference Record in conjunction with the MacQuarrie Test 
< for Mechanical Ability. Bolanovich and Goodman (1) used the Prefer- 
nce Record to select women students for training programs in electronic 
engineering during the war. They found significant differences between 
‘Successful and unsuccessful students. The former showed high scores 
On the Computational and Scientific scales, and low scores on the Musical 
and Clerical scales. Yum (10) found differences among students enrolled 
f in the Physical, Biological, and Social Sciences, and also found low posi- 

‘tive correlations with grade-point average. 

_ Strong (8) mentions that since college courses are largely elective, the 

Student does not choose courses in which he has no interest. He points 

: * Based on a dissertation submitted as partial fulfillment of the requirements for the 

MA. degree at the University of Wisconsin, January, 1945. 

553 


554 Dorothy Terry Hake and C. H. Ruedisili 


out that interests, then, would not be important in determining success in 
elective college courses. University of Wisconsin freshmen at the time 
of this study, however, were much restricted in their choice of courses, 
The subject fields used were all freshman courses, and the usual freshman 
program consisted of four out of the five areas: English, Foreign Lan- 
guage, History, Science, and Mathematics. The problem of free-selec- 
tion, then, is largely eliminated from the study and, presumably, meas- 
ured interest scores might have some part in determining success in these 
subjects. 
Procedure 

The Preference Record was given to all Letters and Science freshmen 
who entered the University of Wisconsin in the fifteen-weeks summer 
session and the fall semester of 1943, The first-semester grades in five 
subjects and the over-all grade-point averages were obtained. The sub- 
jects included were English (N = 579), Science (N = 528), History 
(N = 402), Foreign Language (N = 477), and Mathematics (N = 201), 
All freshmen who took the Kuder Preference Record in the entrance 
examinations and completed the first semester, including one or more of 
the five courses, were included in the study; altogether, 594 students met 
these requirements (3). Men and women students were combined in 
this group since there were no separate norms for the Kuder Preference 
Record at the time this study was made (4). 


Results and Discussion 


The means and standard deviations of the Preference Record scales 
and of grades were computed for each subject area and for the whole 
course load (Grade-Point Average). These figures are in terms of raw 
scores for the scales and are not equated from scale to scale, The grades 
are in terms of grade-point averages, with 1.00 representing a grade of C. 
Differences in mean interest scores between subject groups are frequently 
large. ‘The Mathematics group seems to be especially deviant in interest 
scores. It is relatively high in Computational and Scientific interest 
(39.32 and 66.89 respectively) and low in Social Service (69.17). This 
suggests that students taking Mathematics may have had more specific 
interest in the subject than did students taking other subjects. This 
seems likely, since Mathematics is often avoided as a difficult subject- 
Interest scores for the other subject groups are fairly similar, howeve? 
bearing out the preliminary hypothesis that these students’ interests 40 
not materially affect their choice of courses. The highest grade-point 
average was obtained by the Foreign Language group (1.37) and the 
lowest (1.15) by the English students, The highest S.D. occurred in the 
History group (1.46) and the lowest (0.82) in the English group. 


Predicting Subject Grades with Kuder Preference Record 555 


Table 1 
Correlations between Preference Record Scores and Grades 
English Science History OO; Mathematics 
aT Oo =—13 =J18 -02 
—.10 10 03 03 10 
— 04 18 D -0 10 
08 —.05 A 00 — 02 
0 -03 =03 o =) 
25 — 01 13 12 02 
03 —.06 ~.01 10 = 10 - 01 
— 03 —14 -4 -09 -00 ~ 12 
Sior e eo o% o =o 00 


ble 1 contains the correlations between Preference Record scores 
ubject grades as well as grade-point average. ‘The first column con- 
its of the correlations between the grades of students who took English 
d d their scores on each of the nine scales. ‘The other columna show the 
e relationships for the remaining subjects. ‘There are some expected 
nships apparent in this table. The highest positive correlations 
re between Literary interests and grades in English, History, and the 
i grade-point average. The correlation between Science and the 
entific scale is in the expected direction, The inverse correlations 
tween Mechanical interests and grades in English, a and 
foreign Language also seem reasonable, The Artistic and Clerical 
cales have the lowest correlations with grades in general. 

The correlations among the individual soales on the Preference Record 
it the group as a whole are shown in Table 2, ‘These intercorrelationa, 
p whole, are either low positive or negative. Relatively high posi- 
ereetions, however, are found between the Scientific and Mechani- 


Table 2 
Intercorrelations among Kuder Scores 
_— So E E eee 
echan- Persua- Artie Liter- Mi Boela) 
Mochan- Computer Bigi: Poise Anie Too OOS device 


-0 SRSA A A 
Za =i 0 -i0 -0 -0 
0-9 OT -9 OO -0 -%8 


ET 
BRREERER 

I 

i 

& 

g 

y 
g 


556 Dorothy Terry Hake and C. H. Ruedisili 


cal scales, the Literary and Persuasive scales, and between the Clerical 
and Computational scales. The highest negative correlation is that be- 
tween the Scientific and Persuasive scales, while other fairly large inverse 
correlations are those between Mechanical and Literary, Scientific and 
Literary, Musical and Mechanical, Computational and Artistic, Scientific 
and Clerical. These same tendencies also can be noted in the intracorrela- 
tions found in five other groups, as described in the Revised Kuder Prefer- 
ence Record Manual (5). 

The results of the Wherry-Doolittle Method of Test Selection in- 
cluded the progressive shrunken multiple-correlation coefficients, the 
uncorrected multiple-correlations, the name of the first scale which was 
not included in the battery because it caused a decrease in the shrunken 
multiple-correlations, and K (the coefficient of alienation). The highest 
of these shrunken multiple-correlations is only .3093 for General Grade- 
Point Average. One might expect to find that general school achieve- 
ment can be predicted more accurately from interests than can grades in 
any one subject. Since the multiple-correlations for English (.2936) and 
History (.3022) are almost identical with that for the Grade-Point Aver- 
age, however, there must be other factors involved. Possibly the rela- 
tively large size and heterogeneity of these groups may account for the 
higher multiple-correlations. The lowest multiple-correlation (.1997) is 
that using Mathematics as a criterion, and this group has the smallest N. 
Tt must also be remembered that this group seems to have relatively 
similar interests, and thus is a more homogeneous group. For the whole 
group, the test adding more chance error than validity was the Clerical 
scale; for the History, English, and Foreign Language groups it was 
Artistic; for Science it was the Persuasive scale, and for Mathematics 
the Computational scale. 

Discussion 

In general, it seems likely that interests, as measured by the Kuder 
Preference Record, are a relatively minor factor in predicting college 
achievement. Used alone, the Preference Record would probably be of 
little help. The addition of the Preference Record to college entrance 
examination batteries may be advisable, however, since interest measures 
may very well contribute significantly to the multiple-correlation ob- 
tained with the traditional aptitude and achievement tests. Further 
research which combines the Kuder Preference Record and other interest 
measures with aptitude and achievement scores is advisable before interest 
tests are rejected as being useless in predicting college achievement. 


Summary 
Scores on the Kuder Preference Record were compared with the 
grades obtained in five subject-fields by 594 students. The students 


Predicting Subject Grades with Kuder Preference Record 557 


were first-semester freshmen in the College of Letters and Science at the 
‘University of Wisconsin. The correlations between grades and grade- 
“point averages and scores on the Preference Record scales were computed 
‘and from these the shrunken multiple-correlations were obtained for each 
“subject group and for the grade-point averages of the whole group by 
means of the Wherry-Doolittle Method of Test Selection. 

1. The means and standard deviations of the raw Preference Record 
‘scores, with one exception, did not differ widely with respect to subject 
‘groups, indicating that interests are not an important factor in choosing 
freshman courses, and thus presumably are important in determining 


2. The correlations between the scales and the grades and grade-point 
_ average are, on the whole, fairly low, but certain logical relationships be- 
tween scores on the Preference Record and the subject groups can be 
noted. The Literary scale was found to have the highest positive cor- 
“relations with the subject grades and grade-point average, while the 
- Mechanical and Social Service scales showed fairly high inverse correla- 
= tions. 
3. The intercorrelations among the scales were low and negative, on 
the whole, but fairly high positive intercorrelations were found among 
the Mechanical, Computational, and Scientific scales, between the 
_ Literary and Persuasive, and between the Clerical and Computational 
scales. These same tendencies have been noted in other studies. 
4, The results of the Wherry-Doolittle method show that a few of the 
' Seales, such as Mechanical, Scientific, Literary, and Social Service, were 
more useful than others in contributing to the multiple-correlations. The 
"resulting shrunken multiple-correlations were all low. The largest ob- 
= tained was that for the total grade-point average (.3093), and this ap- 
 proximated the best subject-fields, History (.8022), and English (.2936). 
It is concluded that interests, as measured by the Kuder Preference Rec- 
_ ord, may play a minor role in determining school achievement. In con- 
É junction with other tests (achievement, scholastic aptitude, intelligence, 
_ attitude, etc.) the scores of this test may prove useful in the prediction of 
college grades. 


Received March 15, 1949. 


M 


References 


1, Bolanovich, D. J., and Goodman, C. H. A study of the Kuder Preference Record. 
it Educ. psychol. Measmt., 1944, 4: 315-326. ; r 
2, Crosby, R.C. Scholastic achievement and measured interests. J. applied Psychol., 


1943, 27: 101-104. eh a 4 
8. Froehlich, G.J. The prediction of Academic Success at the University of Wiscon- 
sin, 1909-1941. Bulletin of the University of Wisconsin. Bureau of Guidance and 


i Records of the University of Wisconsin, October 1941. 


558 Dorothy Terry Hake and C. H. Ruedisili 


4, Intermediate manual for the Kuder Preference Record, Chicago: Science Research 
Associates, 1944. 
5. Kuder, G. F. The revised manual for the Kuder Preference Record, Chicago: Science 
Research Associates, 1946, 
6. Peters, C. C., and Van Voorhis, W. R. (Statistical procedures and their mathematical 
bases. New York: McGraw-Hill Book Co., Inc., 1940. 
7. Stead, W. H., and Shartle, C. L. Occupational counseling techniques. New York: 
American Book Co., 1940. ; 
8. Strong, E. K. Vocational interests of men and women. Stanford University Press, 
1943. 
9. Thompson, C. E. Personality and interest factors in dental school success. Educ. 
psychol. Measmt., 1944, 4: 299-306. 
10. Yum, K. 8. Student preference in divisional studies and their preferential activi- 
ties. J. Psychol. 1942, 13: 193-200. 


_MMPI Personality Patterns for Various. Occupations 


Mi 
E. E. Daniels and W. A. Hunter 
VA Regional Office, Phoeniz, Arizona 


amic personality patterns! in relation to vocational selection and per- 
el placement prompted the present investigation. 

First, are there rather definite personality patterns which tend to 
tate toward certain of the multitude of occupations in the vocational 
d? Second, are there rather fixed “personality demands” in the 
s occupations which make up the work of man? 


„ are aware of the numerous expressed reasons why individuals have 
to work in a particular job or embarked on a certain career. Menn- 
points up the problem as follows: “It would be interesting to examine 
it came about that some people must do continuously what seems 
fly drudgery, while other people are able to do what seems to be 
pleasurable and even delightful work, if indeed it can be called work at 
» (1). Is there a particular optimal pattern of personality factors for 
ch occupation which when met contributes to occupational success and 
isfaction? Will a standardized personality test be subtle enough to 
out these patterns for research studies and even for practical 
application in job counseling? 
A review of literature revealed a study made by Harmon and Wiener 
(2) who asserted, in using the MMPI, that personality characteristics 
to be of crucial importance in the actual choice of a vocation, a 
ention which appears to be a distinct aid in the prognosis of the suc- 
training. In another study Verniaud (3) administered the MMPI 
Jerical workers, department store saleswomen and optical factory 
kers and concluded that saleswomen tend to make responses desig- 
as “masculine,” industrial women show definite trends toward 
mania and psychasthenia, while the clerical workers approach more 
ly those responses which had been termed “normal.” ‘The present 
üdy used the MMPI (4) as an exploratory tool in an attempt to deter- 
ne whether a relationship exists between the total personality “work 
s” and the “personality demands” of occupations. 


‘The term “pattern” is used in the dynamic sense and substituted for the term 
te” in the MMPI. 


. 


560 E. E. Daniels and W. A. Hunter 


Methodology 


The dynamics of this study were worked on over a period of 32 months 
in the Veterans Administration Regional Office, Phoenix, Arizona, and 
the raw data were collected from four VA Guidance Centers over the 
State. The conclusions are based on material drawn from a study of 893 
veterans under both Public Laws 16 and 346. All cases had availed 
themselves of complete advisement and guidance, as set forth in the VA 
Manual of Advisement and Guidance (5), which culminated in the veter- 
an’s choice of an occupation. The MMPI categories were coded along 
with veteran's name on an IBM card, and the occupations which covered 
97 category groupings from the Dictionary of Occupational Titles (6) 
were then obtained by IBM selection in terms of DOT code number from 
the master file of status cards, All cumulations were made on the IBM. 

The group represented males of an average age of 23 years, constituting 
several racial groups from all sections of the United States and 90 per cent 
were high school graduates, For the purpose of coding on the IBM cards, 
the range of T scores on the MMPI was grouped at the center of a 10- 
point spread, i.e., at 55 for the range of scores 50 to 60. 

- After averages had been determined for each occupation and for each 
scalo on the MMPI, tables were made up for each personality dimension 
separately, An F-test was made for the various MMPI characteristics. 
The F scores for the Mf, Pd, Sc, and Ma respectively were 5.85, 5.24, 
2.79, and 3.36, which was very significant for Mf, Pd, and Ma, and sig- 
nificant for So, which means that the chances are less than 1 to 99 that so 
large an F could have occurred in a really homogeneous population. All 
semi-skilled and unskilled occupations, DOT codes 6-00 through 9-99, 
were omitted, leaving a total of 67 occupations. Using Fisher's Small 
Sample statistical technique, the significance of the obtained difference in 
the means between various pairs of occupations was calculated. In those 
casos where the difference was found significant there are about five 
chances, and for very significant about 1 chance in 100 that this could 
have occurred by random sampling. Not all pairs where there is a sig- 
nificant or very significant difference between the means are represented 
in the tables; only four of the MMPI scales out of a total of nine scales 
were selected for presentation in this article. 


Results 


Table 1 is composed of a list of selected occupations in terms of the 
average T score on the Masculine-Feminine, Psychopathic, Schizophrenic 


MMPI Personality Patterns for Ocoupations 501 


Table 1 
Mean T-scores on Four Seales of MMPI for Selected Occupational Groups 


fa 


Occupational Group 


5 75.0 63.0 51.0 

5 0.0 65.0 55.0 
Reporter 10 670 70 0 079 
7 65.0 58.0 82.0 57.0 
13 58.8 57.0 55.8 57.0 
5 47.0 57.0 55.0 550 
13 610 55.0 58.0 wo 
8 55.0 63.0 48.0 wo 
10 57.0 63.0 510 620 
10 53.0 00 “0 55.0 
3 52.2 62.0 520 610 
4 65.0 55.0 520 570 
10 62.0 5.0 58.0 mo 
15 49.0 58.0 52.0 mo 
9 51.0 65.0 47.0 50 
4 0.0 625 47.0 550 
m 53.0 5.2 49.0 56.0 
19 55.0 529 40.0 0 
6 52.0 00 520 550 
183 Lo 610 wa “2 
8 58.0 62.0 450 Mo 
J 60.0 m0 50 os 
7 620 58.0 52.0 ws 
6 57.0 63.0 55.0 as 
10 530 wo 52.0 53.0 
5 570 630 55.0 630 


1 Manic scales. Tables 2, 3, 4, 5, 


and 6 giving complete resulta are 
d in this article because of cost.* 


asi of one scale, it tends to obscure differences on other scales. 

‘The data indicate that the means on the MMPI scales for the various 
tions tend to scatter rather widely about the T score of 50; whereas, 
mean of all occupations combined is calculated, the mean approaches 
lFor Tables 2, 3, 4, 5, and 6 order Document 2004 from American Documentation 

1719 N Street, N.W., Washington 6, D. C., remitting $0.50 for microfilm 
1 inch high on standard 35 mm. motion picture film) or $1.00 for photocopies 
inches) readable without optical aid. 


562 E. E. Daniels and W. A. Hunter 


rather closely the T score of 50 for each scale. The data would appear to 
indicate significant differences between the means of personality patterns 
as related to the various occupational objectives. Perhaps this would 
indicate that extensive differences in personality patterns exist between 
occupational groups at the extremes of the distribution of occupations 
obtained on the MMPI scales. It is found, for example, that the mean 
of personality scores for occupations taken from near the middle of the 
distribution is very significantly different from the means at either ex- 
treme. It is also noted that, since the MMPI scores for the various 
occupational groups tend to spread as indicated in Table 1, the statisti- 
cally not significant differences indicate a degree of difference that possibly 
may be considered as placing each occupation near its optimal level on 
the scale for this particular personality pattern of the MMPI. 

In contrast to a rather common interpretation of the MMPI that a 
score below 70 does not indicate a significant personality deviation, it is 
believed that any individual deviation from the mean T score, either posi- 
tive or negative, on any personality scale is indicative of a certain tendency 
toward behavior in that direction, and that extremes, such as a critical T 
score of 70, are not necessary for the instrument to have definite meaning 
and application in the industrial field. 

The Masculine-Feminine pattern on the Minnesota Multiphasic would 
appear, from these findings, to indicate a “work need” often requiring 
rechanneling in order that the occupational satisfaction of the basic 
Masculine-Feminine content of the total personality be achieved. This 
may be illustrated, for example, in the statistical difference between social 
scientist and farmer, livestock. The “work need” for a social scientist is 
an understanding of the problems of other human beings, the prerequisite 
of which is a high degree of sensitivity as seen in the Masculine-Feminine 
pattern. Another example of this need for redirection of the Masculine- 
Feminine component of the total personality can be seen in the occupation 
of physician, whose “work need” is expressed in his “bedside” manner, 38 
compared to the low degree of this pattern in the occupation of draftsman, 
the difference between which is significant statistically. This dependency 
of occupational choice upon the Masculine-Feminine level as indicated 
by the MMPI has been empirically tested numerous times by us during 
Vocational advisement and guidance by attempting to get a person W o 
was interested in some occupation such as barber or beautician to consider 
the objectives of meat cutter or butcher, or vice versa. In all cases there 
has been a violent rejection of the consideration of the alternate objective- 
Statistical evidence and protocols seem to indicate that professions of a 
so-called highly cultural nature require as a fundamental “work need” & 
degree of Masculine-Feminine pattern approaching 70 T score On the 
Minnesota Multiphasic. 


MMPI Personality Patterns for Occupations 563 


ie Psychopathic pattern of the Minnesota Multiphasic would appear 
e characterized by aggressiveness or by asocial behavior, Under- 
h this aggressiveness is the raw hostility and destructiveness as 
monstrated by ample clinical evidence. This hostility originates from 
hood reactions to authority in the family situation. Menninger 
ts out that “The concept of work as drudgery which everyone experi- 
to some extent and which some persons experience to a very high 
, is bound up with this resistance to authority” (1). 
e “work needs” of the individual personality with a high degree of 
pattern may be illustrated in the choice of occupation such as author, 
itor, reporter, or athletic coach. “Purposeless destructiveness and ag- 
ssiveness may be molded and guided into the constructive activity of 
” (1). This hostility may also be observed in those cases character- 
d by failure due to the unrecognized “work need” in the high Pd pat- 
n of personality, as, for example, in the case of the veteran who was 
ing to achieve a father identification in his occupational selection by 
ering the same field as his father. The veteran was unsuccessful in 
his efforts until returning to the psychologist, wherein it became apparent 
from the evidence that hostility between the veteran and his father dated 
to the earliest years. In attempting to alleviate the occupational 
a counseling technique was applied whereby the veteran was 
to return East and spend his entire time, if possible, in close com- 
anionship with his father. This recommendation was accepted, and 
teran a number of weeks later returned with renewed interest and 
mination to succeed in his occupational efforts. ‘The removal of the 
use of competition with the father resulted in a freer expression of his 
cupational efforts in the same field. This would appear to illustrate the 
e of “work needs” which must be recognized if a rechanneling of 
ildhood hostility into an occupational goal is to be successful. 
' Statistical findings in these data indicate that the difference between 
Occupation of athletic coach and the occupation of manager is sig- 
int, which might be interpreted to mean that the managerial oc- 
ations require a complete rechanneling of hostility in the direction of 
ivity in management and administration on a more socially ac- 
level, whereas the athletic coach utilizes his aggressiveness mostly 

ork level of physical effort (sports). ; 
/ E oiron pattern on the Minnesota Multiphasic in this 

appears to indicate a “work need” wherein the individual does 
have to associate too closely with other people. The mechanism of 
on is well delineated in the clinical syndrome of Schizophrenia. 
me mechanism appears to be effective to a lesser degree in influenc- 
ns with a high Sc score in the choice of occupations, as exempli- 
t the difference between the draftsman as compared to athletic 


564 E. E. Daniels and W. A. Hunter 


coach, statistically a very significant difference. The occupation of 
draftsman may be considered as an isolating occupation, whereas the 
occupation of athletic coach required an essential capacity for dealing 
closely with other people. Another example which may be cited shows 
the difference in “work needs” for the occupation of typist, which is 
essentially a social and interpersonal occupation, as compared with the 
draftsman. Thus, occupations indicating a significant low degree of Sc 
pattern would appear to require a “work need” wherein the individual 
may satisfy his gregariousness, as compared to a high degree of Sc pattern 
wherein the occupation requires little association with others in the work 
situation. 

The “work needs” as indicated in the Manic pattern of the Minnesota 
Multiphasic would appear essentially to be an outlet for enthusiasm and 
a high degree of overt activity. In the occupation of radio announcer, as 
compared to electrician, the difference is significant statistically, and may 
afford statisfaction in the occupation by providing an outlet for emotional 
and verbal expression. Again, this “work need” is clearly illustrated in 
the occupation of teacher, kindergarten, as compared with the occupation 
of electrical repairman, where the rechanneling of emotional content may 
be observed. Occupations which require dynamic behavior, such as that 
of lawyer, is another example, as contrasted with automobile uphoslterer, 
the difference between which is statistically significant. 

Thus, many occupations seem to utilize and perhaps demand a per- 
sonality pattern in which there is a great deal of spontaneity and enthusi- 
asm expressed, whereas other occupations make little use of this personal- 
ity pattern. Lewis’ study on this problem asserted “that there is a rela- 
tionship between occupational’ interests and personality tendencies” (8). 


Discussion and Indicated Application 


The “work needs” of the total personality have been presented in this 
investigation with the intent of stimulating further research. The dy- 
namic relationship between the “work needs” of the total personality and 
the selection of occupation would appear to us to be significant. From 
these findings it seems desirable to scrutinize closely occupations in terms 
of “personality demands.” These “personality demands” once established 
could then be matched with the “work needs” of the total personality as indi- 
cated on the Multiphasic patterns in a manner similar to that already estab- 
lished in the occupational realm of job demands and physical capacities 
analysis, 

In this study the dynamics of the total personality are viewed in terms 
of their psychogenetic origins and their development through conditioned 
Tesponse, a learning phenomenon which may be easily observed in the 
- mechanism of parental identification. 


MMPI Personality Patterns for Occupations 565 


By utilizing this technique and viewpoint it seems to the authors that 
Minnesota Multiphasic is a fairly sensitive instrument for measuring 
total personality “work needs” in relation to the suitability of occupa- 
ions having certain “personality demands.” 


d March 18, 1949. 
References 


K. Love against hate. New York: Harcourt, Brace and Company, 1942, 

. 137. 

m, L. R., and Wiener, D. N. Use of the Minnesota Multiphasic Personality 

ventory in vocational advisement, J. appl. Psychol., 1945, 29, 132-141. 

ferniaud, W. M. Occupational differences in the Minnesota Multiphasic Personality 

__ Inventory. J. appl. Psychol., 1946, 30, 604-613. 

Hathaway, S. R., and McKinley, J. ©. Manual for the Multiphasic Personality 

Inventory, New York: The Psychological Corporation, 1943, 

| » I. D. Manual of advisement and guidance, Washington, D, C.: Veterans 
_ Administration, U. S. Government Printing Office, 1945, pp. 11-52, 83-180, 

Dictionary of occupational titles, U. 8. Department of Labor and U. S, Employment 
_ Service, Part I, Definitions of titles; Part II, Titles and codes; Part IV, Entry o0- 

_ cupational classification, and supplement edition III, Washington, D.C.: U. 8. 

_ Government Printing Office, 1939, 

s, J. A. Kuder preference record and MMPI scores for two occupational 

groups. J. consult. Psychol., 1947, 11, 194-201. 


Correcting Special Ability Test Scores 
for General Ability 


Abraham S. Levine 
University of Minnesota 


Most paper and pencil tests designed to measure special abilities or 
aptitudes correlate positively in varying degrees with tests of general 
ability or intelligence. This fact does not particularly detract and may 
actually enhance the predictive efficiency of special ability tests for oc- 
cupations in which success is positively related to general ability. How- 
ever, there are a large number of jobs particularly in the semi-skilled 
trades which do not require more than a modest level of general intelli- 
gence and for which a high degree of such ability has actually been shown 
to be related to high turnover rates. In these occupations the best pre- 
dictors for the most part have been apparatus tests which are negligibly 
correlated with tests of general or verbal intelligence. For reasons of 
economy it is desirable wherever possible to administer group paper and 
pencil tests rather than individual apparatus tests. Therefore, if the 
effect of general ability could be partialed out, the utility of these con- 
taminated paper and pencil tests as guidance and selection instruments 
for the relatively low I.Q. occupations may be increased. 

The proposed method of correcting for the effect of general intelli- 
gence in a special ability test represents a simple application of the re- 
gression coefficient. A regression coefficient enables one to estimate 
the scores on a test if one knows the scores on another test with which it is 
correlated and the magnitude of this correlation coefficient for a given 
sample. For the sake of illustration, let us choose two tests: (1) a gen- 
eral ability test such as the Tiffin and Lawshe Adaptability Test; and (2) 
a special ability test such as the Bennett Test of Mechanical Compre- 
hension. Let us say that John Black obtained a score on the Adapt- 
ability test which was one and one-half standard deviations above the 
mean of a specified group, and there was a +.50 product moment 7 be- 
tween the Adaptability and Mechanical Comprehension tests for this 
group. The best estimate of John’s score on the Mechanical Compre- 
hension test would be .50 X (+1.5) or a standard score of +.75. 

The proposed correction for general ability substracts or adds to the 
special ability test score an amount defined by the size of the regression 
coefficient and the deviation of a general ability test score from the mean: 


566 


Correcting Special Ability Test Scores for General Ability 567 


for example, if John Black actually obtained a standard score of .00 
an score) on the Mechanical Comprehension test, the part contrib- 
| by his +1.5 standard score on the Adaptability test could be 
hly corrected for by substracting .50 X (+1.5) from his Mechanical 
prehension standard score, thereby assigning him a corrected stand- 
score on the latter test of —.75. 
above correction principle may be expressed by the following 
ula providing that all scores are converted into standard score units: 
ted Special ‘Ability Score = Special Ability Score —r X General 
y Score. 
The effect of using this correction formula is to raise the special 
score of an individual who is below the mean on the general ability 
and to lower this score for an individual who is above the mean on 
general ability test. The raising or lowering is of an amount propor- 
il to'the relationship between the two tests and the deviation from 
‘mean on the general ability test. This correction formula serves to 
duce the correlation between general ability test scores and corrected 
al ability test scores to zero, thereby eliminating the variance con- 
ted by so-called general intelligence from a test designed to measure 
al ability or aptitude. Application of the correction would tend to 
such disconcerting phenomena as bright but mechanically inept 
duals scoring high on the Army Mechanical Ability Test and dull 
mechanically gifted garage mechanics scoring low on this test by 
of the aggravatingly high relationship between scores on the 
chanical Aptitude Test and the General Classification Test. 
Incidentally, if corrected scores are to be computed for a large number 
individuals, considerable economy may be effected by constructing 
bles which will enable one to read the corrected scores in either raw or 
d score form directly from the table. For any given r a table 
be easily constructed in which the special ability test scores are ar- 
in progression along the vertical and the general ability test scores 
g the horizontal, or vice versa, and the corrected scores found at the 


of intersection in the table. f : he 
Corrected scores should be used only if the following conditions are 


568 Abraham S. Levine 


3. Where there is an empirically demonstrated relationship between 
corrected scores and occupational success in excess of simply using the 
uncorrected special ability test scores. Since there is more work involved 
in obtaining a corrected score, it should justify itself by adding to the 
predictive efficiency for success in a given job. This is a crucial point 
since the rationale for the correction is primarily a practical one. 

It is anticipated that corrected special ability test scores will find their 
greatest usefulness in the prediction of success in the semi-skilled trades 
and possibly in routine clerical jobs. . 


Received April 11, 1949. 


The Rorschach Test in Industrial Selection 


Audrey F. Rieger 
Robert N. McMurry & Co., Chicago, Ilinois 


_ The place of the Rorschach inkblot test in clinical work has been well 
blished, and some claim made that it is of value in vocational guid- 
Another field to which the test can contribute valuable informa- 
m is that of selection of industrial personnel. 

_ The problem in selection is the choice of a worker who can fulfill the 
luirements of experience and ability for the job and who is a good risk 
long-term employment. He must have the necessary skills and also 
able to fit into the organization. Information about his adaptability 
the job and to the company is very difficult to get, although some of it 
‘may be learned from interviews, references, and tests. 

= Personality questionnaires have often been used as aids in selection. 
The applicant, however, is frequently able to tell what the answers imply 
‘and finds it to his advantage to falsify his responses, if necessary, to give 
the impression he believes is favored for the position. The use of validat- 
keys may permit detection of falsification, but they give little idea 
the direction of the distortion. 

A projective technique such as the Rorschach test makes falsification 
impossible, since the applicant can in no way determine what the exam- 
Mer is looking for. The applicant must interpret the unstructured 
uli of the test in his own manner and is unable to determine how to 
luce a desired picture. 

 Projective techniques have some disadvantages, however, They are 
‘Usually time-consuming and always require careful interpretation, a 
which demands long and careful training. Therefore the cost of 

ering these techniques may make it impractical to use them, 
t for jobs which involve at least a moderate investment on the part 
of the employer. . at al 

_ As a result, few companies, with the exception of large organizations, 
n afford to add a Rorschach worker to their staffs. Providing them 

access to the services of one on a consultant basis makes it possible 
these employers to have the benefit of some information about the 
sonality of the applicant when it is desirable and to pay for it only as 

8 needed. r 
__ For this reason, the services of a Rorschach worker were made avail- 
to the clients of a firm of personnel consultants. The test was 


570 Audrey F. Rieger 


always given in conjunction with the regular selection technique, which 
is based on a Patterned Interview procedure.’ Processing of the appli- 
cant included paper-and-pencil tests, the interview, and the Rorschach. 
The data derived from these sources were then weighted in order to make 
a recommendation to the employer with regard to the applicant’s chances 
to be successful on the job. 4 

In most instances the Rorschach test is of more value if given in ad- 
vance of the interview; a brief report can then be made to the interviewer, 
with emphasis on clues which he can follow up in the interview. Oc- 
casionally, however, it was not possible to give the test prior to the inter- 
view. At such times, ratings for the job could be assigned independently 
from the interview findings and from the test results. 


Table 1 
Occupations of Subjects 


Occupations Frequency 


Personnel Assistant 
Personnel Director 
Office Work 


A 

S 

5 
N] 


Ratings based on the Rorschach results alone involved a comparison 
of the strengths and weaknesses reflected in the test results with the 
specific requirements of the job. For example, an applicant for an 
executive position was considered less promising if his record indicated 
difficulty in organizing abstract material or in controlling his impulses, 
whereas one who showed strength in these areas was more likely to be 
given a more favorable rating. y 

Under the special conditions of independent ratings, a total of thirty 
applicants were studied. Table 1 shows the occupations represented in 
the group. 

Ratings for each of these subjects were made by the interviewer and 
by the Rorschach worker with the specific job in mind. Table 2 gives 
the distribution of the ratings, from 1 (superior) through 4 (reject). 


1R. N. McMurry, Handling personnel adjustment in industry. New York: Harper: 
1944, pp. 297 + xi, ; t 


The Rorschach Test in Industrial Selection 571 


e coefficient of correlation between the two sets of ratings is +-.75 
5, a very significant result. A correlation of this magnitude indi- 
tes that use of either procedure will agree quite well with the results of 
h e other. That the interview alone has great predictive value has pre- 
ously been shown.? Using it as the criterion, it can be assumed that 
, results of the Rorschach are also valuable for industrial prediction. 
‘It must be noted, however, that after only a brief period of using the 
a correlation of this magnitude probably could not be achieved. The 
viewer and the Rorschach worker had been associated over a rela- 

long period of time and were both familiar with the factors on 
h the recommendations were based and the methods used by the 
in weighting these factors. This undoubtedly tended to raise the 
tion. 
Table 2 
Scatter Table Showing Relation Between Rorschach Ratings 

and Interviewer’s Ratings of Job Applicants 


Interview Ratings 
Rorschach —— 
Ratings 4 3 2 1 
1 11 
1 10 3 
2 2 


evertheless, it must be recognized that use of the Rorschach test by 
elf, a practice which is not recommended under any circumstances, 


Furthermore, unless care is used, the interpretation 
y be influenced by the bias of the examiner. n 5 
At the same time, the Rorschach offers unique help in learning many 


j about job licants (particularly at the higher occupational 
Be ane oie in in Le rediction. The Rorschach test can 


April 10, 1949. 
N. McMurry, Validating the patterned interview. Personnel, 1947, 24, 263-272. 


The Rorschach Test and Occupational Personalities 


Audrey F. Rieger 
Robert N. McMurry & Co., Chicago, Illinois 


The question of differences in personality which may differentiate 
between occupational groups (and therefore aid in the selection of em- 
ployees) has been raised, and some attempts have been made to answer 
it. Dodge, for example (3, 4, 5, 6) found sales and clerical personnel had 
different patterns of scores on the Bernreuter. Paterson and Darley 
(10), using the same instrument, were unable to detect differences in 
their subjects. Verniaud (14), studying saleswomen, clerical workers, 
and optical workers, reports some differences in MMPI scores which she 
says correspond with differences in the occupational requirements. 

Kaback (8), using the group Rorschach method, noted some statistic- 
ally significant differences between pharmacists and accountants. Never- 
theless, she concluded that neither group showed any generalized char- 
acteristics. A less exhaustive study using the same technique, that of 
Harrower and Cox (7), reports some differences between other occupa- 
tional groups. Steiner (13) has summarized the Rorschach literature 
reporting studies of occupational groups. 

The present investigation was designed to study personality patterns 
of certain specific occupational groups as reflected in the individual Ror- 
schach test to determine if differences between such groups do occur and 
if the differences are meaningful in practical situations. 


Subjects 

The opportunity to make this investigation into occupational differ- 
ences in personality arose in the course of routine procedures in the offices 
of an organization of personnel consultants. Applicants for positions 
with client organizations are interviewed (9) by one of the consultants 
and are given such paper-and-pencil tests as seem applicable to the posi- 
tions for which they are being considered. In addition, the applicants 
are given an individual Rorschach test by the writer. The Rorschach 
was adopted for routine use in the employee evaluation program to give 
the interviewer a fairly objective portrait of the personality of the can- 
didate and to aid in the evaluation of the information elicited in the 
interview, telephone checks with previous employers, and tests. On the 
basis of these data, the candidate is then rated with regard to his potential 
value as an employee. 

572 


Rorschach Test and Occupational Personalities 573 


_ As a rule, the candidates interviewed have previously been screened 
he employer. The men doing this preliminary screening have been 
ined to be alert to clues indicating instability, inadequate intelligence, 
ad other factors affecting success on the job. As a result, it seems likely 
the candidates, most of whom would have been hired if expert advice 
not been available, represent a better-than-average group of workers. 
Hence these subjects cannot be considered as representative of appli- 
nts in their respective occupational fields. Moreover, the occupational 
ifications, based on the employers’ job descriptions, may appear to be 
mewhat arbitrary, as some of the applicants lacked experience in the 
d. Since they were believed to have possibilities for the job, however, 
ed reasonable to include them as subjects for study. Some differ- 
which might have occurred between more clear-cut occupational 
ories may have been obscured as a result, a fact which should not be 
looked in assessing the results of this study. 


Table 1 
Occupational Classification of Subjects 


Pj: Sales (technical) 55 19-48 


Engineers 53 21-50 
Supervisors, foremen 36 23-56 


Administrators 64 2448 
-Clerical workers 66 17-45 
Personnel workers 24 2245 


Merchandising trainees 32 20-36 
_ Miscellaneous 21-46 


SSSNSSys 


‘The jobs for which the applicants were being considered form the 
for classification into occupational groups. These jobs fall into six 
gories, with two additional miscellaneous groups. Table 1 summa- 
Some information about the groups. “Years of experience” noted 
thé table refers only to experience for the specific job, rather than years 
work rience in general. ; 

The o ae not being considered for eventual promotion 
hite collar jobs; they were men who had done well in the shop and 
Moving up. They had less formal education than the other sub- 
S and did relatively less well on verbal tests, making their best scores 
Non-verbal items. The clerical group includes accountants, statisti- 
bs, and others doing similarly complex work. The last two groups, 


574 Audrey F. Rieger 


the trainees and the miscellaneous, lack homogeneity as they include a 
wide range of occupations. 
Methods 

It was not possible to use as subjects only those who were recom- 
mended for employment, as the ratings were based on the results of the 
Rorschach test as well as on the information from the interview and test 
procedure and could not serve as criteria. Furthermore, the other data 
about the applicants were not uniform, the interviewing having been 
done by different individuals, the applicants having taken different tests, 
ete. Asa result, none of the data except the Rorschach test scores could 
be used. 

A large number of Rorschach scores were tabulated by occupational 
groups. Included were all those given weight in the orthodox interpreta- 
tion of the results (1), such as color responses, the approach type, ete. 
In addition, the literature was reviewed for statements about occupa- 
tional differences in personality which might be represented by Ror- 
schach components; these were tallied. Finally, other scores which on 
an a priori basis may reflect differences between the groups were also 
studied. 

Table 2 presents the means and standard deviations for all groups for 
the more important scores studied. 

Tn reviewing the results, the statistical reliability of the scores must 
be considered. Group differences may be minimized or completely ob- 
seured by lack of dependability of the measures. Unfortunately, no 
conclusive evidence has been put forth in the literature to answer this 
problem. In general, it is probably true. that some of the Rorschach 
Scores possess a high degree of reliability, and others are less dependable. 
It must also be recognized that ratios and difference scores, which repre- 
sent relationships between imperfect measures, are less reliable than the 
component parts. Differences between groups might be hidden by such 
unreliability, and results with these scores must be taken with caution. 
Chief among these are FC—(C-+CF), W%, ete. 

„ „Two statistical methods were used. The first involved testing the 
differences between the means of the groups for each score to determine 
if any of the differences were significant (CR at least 3). If such differ- 
ences occurred consistently, it was planned to make up a composite 
picture of the worker in each field based on the means of the scores. Al- 
though this procedure is contrary to the basic idea of the interdependence 
of all behavior in the Rorschach test, it had had at least a limited success 

1 Table 2 may be ordered as Document 2652 from American Documentation Institute, 
1719 N Street, N. W., Washington 6, D. C., remitting $0.50 for microfilm (images 1 inch 


high on standard 35 mm. motion pi i 8 inches) 
z picture film) or $0.50 for photocopies (6 X 
readable without optical aid. D 


Rorschach Test and Occupational Personalities 575 


here (11) and might aid in the development of occupational per- 
ity patterns to be adapted to selection and guidance. 
The second method involved chi square tests of a number of scores 
h are more meaningful if interpreted in relation to each other, an 
ch difficult to manage in a study such as this where the clinical 
plications of many of the results must be overlooked for lack of in- 
ion and an inability to deal with large numbers of subjects on an 
vidual basis. 
o equate one set of scores with another, a number of the measures 
transmuted into a normalized scale of standard or T-scores, with 
al means and standard deviations (2). A T-score on any one scale 
the same relationship to the distribution of those scores in the total 
p as does the same T-score value on any other scale. The scores so 
ated were chiefly those making up the Approach Type (W, D, and Dd) 
the Experience Balance (M and C). 


Results 


The attempt to find personality differences between the occupational 
ips had some significant results, but the Rorschach scores, as tested 
would fail to differentiate the groups in practice. Most of the 
ces appear to be related to variations in response total; for ex- 
e, if R is high, W, D, or Dd is necessarily high in relation to the 
s of other subjects with low R. Whether differences which are 
dent on variations in R can be considered as real differences is a 
stion which requires further study. 
Tables 3, 4, and 5 summarize the significant differences between 
The chi square tests support some of the results but fail to add 
ch new information. i 
Only two groups of subjects tend to stand out from the remainder. 
These are the administrative group and the supervisors and foremen. 
administrators are characterized chiefly by their facility in pro- 
and handling ideas (high R, low A%, ete.). Scores of this group 
e complexity of structure as well as lability and freedom of ex- 
Most of the areas in which these subjects differ from the re- 
of the groups appear to be dependent on superior verbal facility, 
er, since the significance of the differences disappears when R is 
n into account. ites 
supervisors and foremen form a fairly homogeneous classification 
pear to be truly different from the other subjects. The group of 
isors, however, would probably stand out less noticeably if com- 


brevity, the tables are omitted. The writer will be glad to supply the informa- 
request, 


576 -Audrey F. Rieger 


pared with similar workers; this is the only group of subjects not of the 
white collar or professional classes, a fact which must not be overlooked 
in reviewing their scores. 

The supervisors are characterized by: limitation of ideas (low R), 
narrow range of interests (high A%), rigidity in judgment (high F + %), 
and restricted emotional life (low M and C). They are ill at ease in close 
associations with others (H-Hd low), and they tend to avoid contacts 
with others, even superficial relationships (H% low). 

These restrictions may be explained by a number of factors. Prob- 
ably most important is the fact that these men work chiefly with their 
hands and rarely deal with verbal concepts. Their weakness in verbal 
matters is evidenced by the relatively poor showing in verbal tests. In 
addition, the relatively impoverished background must be noted, with 
emphasis on the lower level of education. Finally there is the possibility 
that the personality may be reflected in the choice of occupation. 

Although some traits seem to differentiate between the other groups 
(i.e., the salesmen seem to be more concerned with problems of health 
than are the other groups), these differences in scores may result from 
chance factors in the selection of the applicants rather than from specific 
group differences. The chief exception to this is the suggestion that the 
engineers show less interest in other people (H% low), a finding which 
corroborates the results of many other studies. In the case of the ad- 
ministrators also, the personality characteristics noted here can be related 
to some extent to the requirements of the job. 


Discussion 


The fact that the individual Rorschach test reflected variations be- 
tween the occupational groups suggests that the Rorschach is sensitive 
to differences between groups. The test probably lacks reliability when 
used statistically to study differences between groups, however, and may 
therefore hide or distort some actual differences. 

Another limitation is the selection of above-average rather than rep- 
resentative subjects. ‘These may be more similar than would be subjects 
chosen at random. 

Furthermore, the occupational classifications used in this study were 
not homogeneous categories of jobs. Although they were based on the 
employers’ job titles, the duties under each heading varied widely. 

‘Salesman,” for example, might mean one who sold machinery or one 
who merely created good will for his employer’s products. As a result, 
the heterogeneity of the jobs within each classification probably lessened 
the chances of turning up significant differences. A 

The fact that so few occupations could be differentiated in this 10- 


Rorschach Test and Occupational Personalities 577 


igation is of significance for selection and guidance. It is unlikely 
t the technique is wholly at fault in not turning up differences, since 
“some consistent differences were noted. In some rare instances, such as 
< cases where the individual possesses some special talent, the choice of 
‘occupation is determined by the talent. It is more likely that the occupa- 
tion in which the worker spends most of his life is almost a matter of 
Chance, determined by opportunity, rather than an end result directed 
by specific personality structure. 
= If this is true, no occupation can be said to draw people of similar 
personality makeup, although it may influence them to the extent that 
they later appear similar. There are a few exceptions to this, such as 
ertain research fields (12). In general, however, any single personality 
attern can be fitted into a number of jobs which may appear to differ 
greatly in demands on the individual; this is indicated by the great over- 
pping of scores between groups, even when the means differed signifi- 
cantly. No single personality type can be associated with any of the 
occupational groups, nor can it be assumed that any particular type of 
personality occurs to excess in any occupational group. 
Recommendations for hiring must be based not on a general pattern 
for an occupation but on the specific requirements of the job and its place 
ithin a functioning organization. Here the Rorschach test can be of 
“great value in pointing out the applicant’s strengths and weaknesses, 
with due consideration to the part he will play in the particular organiza- 
tion and without concern for a generalized occupational pattern. 


Summary 


A study of several occupational groups by means of the individual 
Rorschach test showed a few statistically significant differences between 
groups. The only important result is the distinction found between those 
‘Who deal with verbal concepts (chiefly administrators but including 
salesmen, engineers, clerical workers, and personnel workers) and those 

Who work with their hands (supervisors and foremen). Personality 
rns cannot be reliably used for placement, selection, and guidance. 


These findings (the lack of patterns) should not be construed as a denial 
of the oe of the descriptive elements of the Rorschach results 


in selection and guidance. 
ed February 23, 1949. 


References 

Beck, S. J. Rorschach’s test. New York: Grune & Stratton, 1944-1945, 2 vol. 
‘Cronbach, L. J. A statistical method for treatment of limited patterns of scores. 
_ Unpublished MS, Univ. Chicago, 1948. 

Dodge, A. F. Social dominance and sales personality. J. appl. Psychol, 1988, 22, 
132-135, 


578 Audrey F. Rieger 


4, Dodge, A. F. What are the personality traits of the successful sales person? J. 
appl. Psychol., 1938, 22, 229-238. 

5. Dodge, A. F. What are the personality traits of the successful clerical worker? 
J. appl. Psychol., 1940, 24, 576-586. 

6. Dodge, A. F. Characteristics of good clerks. Person. J.; 1942, 20, 324-327. 

7. Harrower, G. J., and Cox, K. J. The results obtained from a number of occupa- 
tional groupings on the professional level with the Rorschach group method. 
Bull. Can. Psychol. Assn., 1942, 2, 31-33. 

8. Kaback, Goldie Ruth. Vocational personalities, Teach. Coll. Contr. Educ., No. 
924, New York: Columbia Univ. Press, 1946, 

9. McMurry, R. N. Handling personnel adjustment in industry. New York: Harper, 
1944, 

10. Paterson, D. G., and Darley, J. Men, women, and jobs. Minneapolis: Univ. Minn. 
Press, 1936. 

11, Rieger, Audrey. Rorschach analysis of adolescent groups. Unpublished MS, 
Univ. Chicago, 1945. 

12. Roe, Anne. Rorschach study of a group of scientists and technicians. J. consult. 
Psychol., 1946, 10, 317-327. ` 

13. Steiner, Matilda E. The use of the Rorschach method in industry. Rorschach Res. 
Exch., 1947, 11, 46-52. 

14. Verniaud, Willie Maud. Occupational differences in the Minnesota Multiphasic 
Personality Inventory. J. appl. Psychol., 1946, 30, 604-613. 


ficant variable in various kinds of motor performance, Emphasis 
been placed on the prediction of various skills by the use of various 
psychomotor tests and in the work done thus far there has been a tend- 
icy to accept a general factor or component of ‘‘steadiness.” Seashore 
j) sums up the conditions necessary for such a factor as follows: “‘Accord- 
-to the hypothesis of a group factor for steadiness, those coordinations 
Which emphasize accuracy (precision or steadiness) while minimizing 
eed and strength should cluster together. Such steadiness tests, in 
ide variety, should intercorrelate moderately or highly, and show no 
rrelations with speed and strength tests.” 
_ Various studies have presented evidence supporting the hypothesis of 
‘Group “steadiness” factor. Spaeth and Dunham (5) working with 73 
army men, studied the relationship between Dunlap’s test of precision in 
ing (1) and rifle target shooting. The correlation between the two 
for subjects ranging from poor to expert marksmen was .61, a very 
significant relationship. Seashore and Adams (4) found test intercorrela- 
tions of .45 or higher with a battery of five steadiness tests (postural 
y with eyes closed, rifle muzzle sway when sighting, hand tremor, 
lus thrusting at holes, and stylus held stationary in holes). Hum- 
hreys, Buxton, and Taylor (2) reported intercorrelations ranging from 
7 to .69 with a median coefficient of between .52 and .55 between thrust- 
‘steadiness, stationary steadiness, an ataxiameter, and rifle sway. 
atively little research has been reported concerning the nature of the 
“factors responsible for differences in performance on various tests measur- 
Precision of movement and steadiness. 


Purpose of Study 


The present study was an attempt to determine the nature of factors 


“Underlying performance on seven measures of visuo-motor co-ordination. 


This i Northwestern University as part of a larger project 
t the nag ee Seashore. It was subsidized by the Office of Naval 
h under its policy of encouraging basic research. The opinions and interpreta- 
expressed, however, are those of the authors. The authors wish to acknowledge 
heir indebtedness to Douglas Ellis, Richard Hetke, and Clarence Forsberg, who col- 
ected the experimental data for the second group of subjects. 

579 


580 R. H. Seashore, F. J. Dudek, and W. Holtzman 


These measures emphasized precision of movement of the preferred arm 
and hand. The tests used in this battery were selected with several con- 
siderations in mind: 1, they should be relatively uninfluenced by strength 
and speed; 2. they should be relatively free from the effects of muscular 
fatigue; 3. it should not be possible to get a high score by “trick” per- 
formances, and, 4. there should be little practice effect. 


Tests Used 


Seven tests were selected for inclusion in the battery. It is recognized 
that these seven tests do not sample, in all probability, the entire range of 
steadiness measures. However, it was not possible to include more vari- 
ables in this battery because of time limitations. The more promising 
tests as determined from the analysis to be described will be included in 
another battery in an attempt to study and define more completely the 
domain of “steadiness.” The seven tests are described below: 


Land2. The Universal Ataxiameter: This test was designed to measure the 
horizontal and vertical components of involuntary movement of the hand and 
forearm. The subject attempted to hold a wooden tab or handle as motionless 
as possible. Movements were magnified by means of a leverage system and 
photo-electric cells recorded the amount of movement made. Horizontal and 
vertical components were scored separately. Five trials of 15 seconds each 
were administered in each cycle of tests, l 

. 3, Seashore Photoelectric Target Register (Revised): This test emphasized 
aiming and constant adjustment of a circular beam of light to a target. From 

a mirror which was on the end of a rod controlled by the subject the beam of 

light was reflected into a small hole. The beam of light activated a photo- 

electric cell which recorded the time the individual was “on the target.” If 
the aim was perfect (i.e., the circle of light completely covering the hole) the 
counter recorded 10 counts per second—if the beam was only partly on the 

target the counts per second were correspondingly less. Five trials of 15 

seconds each were administered in each cycle. . 

4. Straight pig Test: This test was a modification of the V-slot tracing 
test described by pple. At a controlled speed the subject drew a wire 
stylus between two brass plates without any base plate. The path formed by 
the brass plates was a converging one and the direction of the hand-movement 
was toward the body. If the stylus touched either side of the path it activated 
a very sensitive Potter Electronic Counter which counted at a rate of 60/sec. 
ae g time the stylus was in contact. Five trials were administered in each 

5. Curved Tracing Test: This test was a variant of the straight trace. The 
path was of the same width throughout, but it was irregularly curved. s 
subject moved a wire stylus from left to right along this curved path. Scores 
were obtained on the Potter Electronic Counter as mentioned above. Ten 
trials of 15 seconds each were administered. ; 

6. Sine-Curve Rod Tracing Test: In this test the subject moved a ring stylus 
b As inch inside diameter along a brass rod of 34 inch diameter. The rod was 
fi ent vertically in the form of a sine curve. The direction of arm-movement was 

rom left to right. Time of contact between the ring and the rod was recor 
by the Potter Electronic Counter. Speed of movement was controlled so tha 
each trial required about 30 seconds, Five trials were administered per cyl 


BY Factorial Analysis of Arm-Hand Precision Tests 581 


. 7”. Three-dimensional Rod Tracing Test: This test was constructed and 

red similarly to the previous test, but the rod was bent into an irregular 
e in three dimensions. The ring stylus had a somewhat larger inside 
lameter (1 and 3g inches) to permit greater freedom of movement. Five 
als requiring approximately 40 seconds each were administered during each 


e. 
= 8. Thrusting Steadiness Test: This test was a modification of Dunlap’s test 
“used by Seashore, Adams and others. The subject thrust a stylus into a hole 
n time with a metronome at a rate of one thrust every two seconds, Holes of 
e diameters were used, with S making 10 thrusts into each size hole in five 
erent trials. The score was the number of thrusts made without contacting 
side of the hole. 


Method 


_ This battery of tests was administered under a cycle plan. On one 
y the subject went through two successive cycles. Each cycle of tests 
uired approximately 40 minutes to administer. Forty-eight hours 
later the subject returned and repeated two more cycles of the complete 
battery. In this way it was possible to obtain measures of reliability 
ing any one testing period and for test-retest periods. Reliabilities 
indicated in the last three columns of Table 1. The first two columns 
“contain uncorrected, test-retest reliabilities for cycles combined in various 
"ways, The last column is an estimate, corrected by the Spearman- 
Brown formula, of the reliability of the total score (i.e., all four cycles). 

e test-retest reliabilities range from .54 for the horizontal component 
‘the ataxiameter to .85 for the three-dimensional rod test. These co- 


Table 1 


Intercorrelations, Means, 8.D.s, and Reliabilities of Tests in 
Steadiness Battery (As computed for Group II, N = 100.) 


Reliability 

gd ee 

cles Oroles Of 

Product-moment 7’8. Sak oP ee 

Watiable No.1 2 38 4 5 6 7 M 8D. 8+4 até Cycles* 
1 519 291 54 67 7% 
2 47 443 268 75 84 89 
3 44 a 5.95 265 79 83 89 
4 39 20 40 3.10 2.76 7 82 8 
5 30 04 26 61 4.77 347 82 95 95 
6 31 —06 26 57 72 443 312 7 89 91 
7 22 —07 34 37 72 83 4.08 368 85 82 91 
8 31 .35 43 66 38 41 49 403 264 78 88 91 


582 R. H. Seashore, F. J. Dudek, and W. Holteman 


efficients indicate a rather high degree of stability for day-to-day meas- 
urement of precision of movement and steadiness.! y 


Results 

The tests in this battery were administered to 100 volunteer subjects 
selected from elementary psychology classes at Northwestern University. 
All were right-handed males. Since stability of day-to-day measures 
seemed fairly high the scores used for analysis were the sums of scores on 
all performances for any given test. Product-moment correlations were 
then computed for these scores. These intercorrelations are presented in 
Table 1. The variables are, in general, positively correlated. No corre- 
lations are significantly negative though several are not significantly 
greater than zero. 


Table 2 
Centroid and Rotated Factor Loadings of Tests in Steadiness Battery 
Group II (N = 100) Group I (N = 39) 
Rotated 
Centroid Loadings Loadings Rotated Loadings 


Variable ep aE) aaa T S 8 
Atax. Hor.) 1 56 31 22 49 46 23 27 [68 


2 
Targ. Regis. 3 60 32 25 53 52 25 27 |62| 22 03 63 
Str. Trace 4 78 09 —50 83 86 23 10 —03 [66] 11 465 
Curv, Trace 5 72 —40 —09 68 68 46 -03 27 45 46 48 
Sine Trace 6 74 —53 —03 83 84 |si| 41 —0s [84] 20 29 83 
3-dim. Trace 7 72 -56 17 85 86 |g9| 24 00 7 14 74 
Thrust 8 69 17 —17 55 56 2 [6I] 31 19 ø 30 57 


s In the data for Group 1 the horizontal and vertical scores for the Ataxiameter were 
combined into a single score. 

The matrix of intercorrelations was subjected to a centroid factorial 
analysis. The results of this analysis are presented in Table 2. Most 
significant here are the two matrices of rotated factor loadings. The 
data for Group I were obtained in a preliminary investigation carried out 
by Holtzman in the summer of 1946. Only 39 individuals were used as 
subjects for this study. Nevertheless, the intercorrelations resulting from 
this sample were factorially analyzed with the resulting factors shown in 

i = Note that the coefficients of reliability which compare trials 1 and 2 of the first day 
with trials 3 and 4 of the second day are somewhat lower than the correlations between 
trials 1 and 3 vs. trials 2 and 4, which tend to balance out the diurnal vairation. G 
Paulsen, J. appl, Psychol., 1935, Vol. 19, pp. 166-79 and pp. 29-42 found, as we do, that 
there was a lower intercorrelation between trials on successive days than on the same day, 
on one test of hand steadiness. However there was still a very significant correlation 
between days, í 


Factorial Analysis of Arm-Hand Precision Tests 583 


00 individuals (designated Group II). It is noteworthy that the 
factorial picture is so similar from group to group. Major factor loadings 
each sample have, been enclosed in boxes. It is apparent that the 
factor structure is practically identical from sample to sample. 

‘Three factors were adequate to account for the correlations in each 
mple. When these were rotated with criteria of simple structure and 
ositive manifold in mind the factors resulting from each analysis could 
be identified as similar. These factors were identified and named as 
follows: 

Factor I has primary loadings on the three-dimensional rod test and on 
the sine-curve rod test. It has a major loading as well for the curved 
‘tracing test for Group II, but not for Group I. This factor seems obvi- 
" ously to be associated with steadiness and precision of movement which 
involves spatial components in two or more planes. 

~ Factor IT has major loadings in the straight tracing test and in the 
_ thrusting steadiness test. It will be recalled that in the straight tracing 
i test the subject moved a stylus along a straight path toward his body. 
n the thrusting test the subject thrust a stylus toward a hole away from 
his body, but in a very similar path. This factor, it would seem, is as- 
| sociated with precision of movement in a restricted plane. 

Factor III has major loadings for the target register and for the 
taxiameter scores. The common component here is involuntary 
vement of the arm and hand. This factor seems to be what is usually 
thought of as “steadiness.” 

An examination of the factor loadings suggests that simple structure 
has not been ideally achieved and that the factors may be somewhat cor- 
related. Correlations among the factors were estimated graphically by 
determining the cosine of the angular separation between oblique vectors 
presenting them. Factors I and II are moderately correlated. Factor 
TII does not seem to be highly correlated with the other two. It would 
seem, then, that stationary steadiness is relatively independent of steadi- 

where movement is involved. The correlation between factors I and 
should not be surprising, since both involve movement. The number 
dimensions within which movement takes place seerns the important 
difference. Factor I included those tests calling for movement in a three- 
limensional space, while Factor II was restricted more or less to moye- 
- ments in a restricted plane—or two-dimensional space. There is another 
possible interpretation based on the amount of movement involved. 
Those tests involved in Factor II require relatively shorter movements 
than do those appearing in Factor I. (The only anomaly here is the 
aped trace test which in this respect is more like the tests appearing on 


ight-hand part of the table. In the present study data were gathered ` 


tor II.) In this way it might be that Factor I represents tests in- 


584 R. H. Seashore, F. J. Dudek, and W. Holtzman 


volving a general bodily orientation and control whereas Factor II tests 
represent finer adjustments within but one member of the body (the hand 
and arm). This hypothesis remains to be investigated. 


Summary 

The results of this investigation suggest several important considera- 
tions. 

1. The intercorrelations among various tests all of which presumably 
depend on steadiness and precision of movement cannot be adequately 
accounted for by postulating a single factor of steadiness. There are, it 
would seem, various components influencing performance on these seem- 
ingly similar kinds of tasks. 

2. There may be yet other factors accounting for scores made on 
steadiness tests measuring only involuntary kinds of arm and hand move- 
ment. This is suggested by the differences between the common factor 
variance and the reliabilities of the tests measuring involuntary movement 
in the present battery. While the reliabilities would indicate from 80 to 
90% of the variance accounted for, common factors seem to account only 
for about 50% of the test variance. This would mean that about one- 
third of the variance is attributable to some specific factor or to a common 
factor yet to be identified by including these tests in another battery with 
other types of tests. 

8. The results indicate that stationary steadiness (or involuntary 
movement of the arm and hand) is not highly related to precision of 
movement. Those factors, however, which involve spatial components 
are related to a greater degree. 

These results and conclusions have implications in the area of selection 
in certain industrial areas. Many kinds of tasks are recognized as de- 
pending upon “precision,” “steadiness,” or “coordination.” However, 
it would seem that there are several identifiable components—the isola- 
tion of which would improve our ability to select for the particular com- 

, Ponents which are represented in particular jobs. ‘Thus—a lathe operator 
and drill-press operator might not require the same kinds of “steadiness.” 
Received March 14, 1949. 

References 

1. Dunlap, K. Improved forms of steadiness tester and tapping plate, J. exp. P. sychol.y 
1921, 4, 430-433, 

2. Humphreys, L. G., Buxton, C. E., and Taylor, H. R. Steadiness and rifle marks- 
manship, J, appl. Psychol., 1936, 20, 680-688. 

3. Seashore, R. H. An experimental and theoretical analysis of fine motor skills, Amer. 
J. Psychol., 1940, 53, 86-98. 

4. Seashore, R. H., and Adams, C. R. The measurement of steadiness: a new apparatus 
and results in marksmanship, Science, 1933, 78, 285-287. E 

5. Spaeth, R. A., and Dunham, G. C. The correlation between motor control and rifle 
shooting, Amer. J. Physiol., 1921, 56, 249-256. 


What Do Readership Studies Really Prove? * 


H. P. Longstaff and G. P. Laybourn 
University of Minnesota 


Almost every publication seems to have “just made a reader study” on 
the basis of which it seems to be able to prove that it “leads the field in 
readership.” Such violently conflicting claims have become so common 
that many a reader, advertiser, and publisher—confused by this con- 
tradiction of alleged facts—has begun to ask: “What do readership 
studies really prove?” One such perplexed publisher set out to try to 
answer this question. 
É Purpose 
i The purpose of the study conducted by the Putman Publishing Com- 
pany was to call attention to fallacies inherent in “readership studies” as 
they have been commonly conducted and to suggest the need for more 
careful scrutiny of the results of such studies. It was not the purpose of 
is investigation to set up an entirely new, flawless technique for the 
study of relative readership. This was a purely analytical investigation 
of readership-study methods and techniques. 


Procedure 


 Toappraise the validity of the “orthodox” type of readership study, a 
‘Procedure was devised to compare the relative readership standings of 
three industrial magazines on the basis of three different readership-study 
‘techniques which yielded, respectively, (1) the number of readers based 
$ the number of “mentions”! obtained in response to an original 

estionnaire employing “orthodox” readership-study techniques, (2) 
the number of readers corrected for “votes” * obtained in response to a 
follow-up questionnaire, and (3) the number of readers corrected for 


_ *This paper is a condensation and revision of an investigation conducted by Mr. 
L. Putman of the Putman Publishing Company, and is published with their per- 
mission, The original study entitled “We Made a Reader Survey” was published by the 
above mentioned company. Copies of the original report are available (without charge 
to industrial advertisers and their advertising agencies) upon application to the Research 
Department, Putman Publishing Co., 737 North Michigan Avenue, Chicago 11, Illinois. 
a eniron refers to the naming of a magazine in reply to the question, “What maga- 
R you read?” x p 
Hi * Vote refers to a “YES” response to the question, “Do you read this magazine? 


585 


586 H. P. Longstaff and G. P. Laybourn 


votes and “disqualifying negative comments” * obtained in response to the 
follow-up questionnaire. 

This procedure entailed the use of three different questionnaires: (1) 
an original “orthodox” type of questionnaire, sent to 1,000 known readers 
of Magazine A, asking “What magazines do you read?”’; (2) a follow-up 
questionnaire, sent to those who failed to mention Magazine A in response 
to the original questionnaire, asking “Do you read Magazine A?” and 
providing for comments on Magazine A; and (3) a follow-up question- 
naire, sent to those who failed to mention Magazine B and/or Magazine 
C in response to the original questionnaire, asking either “Do you read 
Magazine B?” or “Do you read Magazine C?” and providing for com- 
- ments on either Magazine B or Magazine C. 

The First Questionnaire. The original questionnaire appeared on the 
letterhead of an independent manufacturer, who supposedly was attempt- 
ing to determine in what industrial magazines he should place some ad- 
vertising, and requested the addressee to write down the names of the 
industrial, business, or trade magazines which he reads. This first 
questionnaire was mailed to 1,000 “known readers” of Magazine A whose 
names and addresses had been copied from response slips‘ which readers® 
had taken from issues of Magazine A. Thus, Questionnaire No. 1 pro- 
vided a check of what known readers say they read, after they had proved 
their readership of one publication, without knowing that the questioner 
knew anything about what they had read. 

i The Second Questionnaire. Since it was found that over one-half of the 

known readers” of Magazine A who replied to Questionnaire No. 1 
failed to mention Magazine A, it was decided to send them a follow-up 
questionnaire in order to discover why they had failed to do so. This 
second questionnaire appeared on the letterhead of the same manufacturer, 
who supposedly wondered whether’ the addressee’s failure to mention 
Magazine A in replying to the first questionnaire was merely an oversight, 
and requested the addressee to indicate whether or not he read this 
Magazine and provided a space for him to comment upon it. Thus, 
Questionnaire No. 2 was sent to those persons who, in replying to the 
original questionnaire, had failed to mention Magazine A. 

* Disqualifying negative comment refers to a comment which indicates that a respond- 
a tg replied “YES” to the question, “Do you read this magazine?” actually does not 

* These slips had been inserted into copies of Magazine A; readers filled in subjects 02 
vant wished more information, signed their names, and mailed the slips to the 

5 These readers were not necessarily subscribers to Magazine A: in several cases it was 


apparent that these inquirers had sent slips taken from copies of the magazine received 


by someone else. 


What Do Readership Studies Really Prove? 587 


‘The Third Questionnaire. Since it was found that nine out of every 
respondents replying to Questionnaire No. 2 reported that they read 
ne A (despite their failure to mention Magazine A in response to 
tionnaire No. 1), it was decided to duplicate as closely as possible 
onditions under which the second questionnaire had been sent out 
sing about Magazine A by sending out a third questionnaire asking 
t Magazine B and Magazine C. In order that no one would receive 
than one follow-up questionnaire, Questionnaire No. 3 was not sent 
those who had been sent Questionnaire No. 2, i.e., those who had failed 
mention Magazine A. Thus, the third questionnaire was sent to 
hose who, in replying to the’original questionnaire, had failed to mention 
zine B and/or Magazine C but had mentioned Magazine A. Ques- 
ionnaire No. 3 duplicated the conditions of Questionnaire No. 2 as closely 
5 possible with the letter as nearly as identical as possible. Each ques- 
onnaire dealt with only one of the two magazines, Magazine B or Maga- 
e C, and requested the addressee to indicate whether or not he read 
magazine and provided a space for him to comment upon it. 


Results and Discussion 


questionnaires, and (3) the interpretations necessitated by the 
ments accompanying the replies to the follow-up questionnaires. 
The data obtained in response to each of the three questionnaires are 
Presented in Table 1. ; 
~ The First Questionnaire. In response to the 1,000 letters mailed to 
“Known readers” of Magazine A, 585 replies were received,—a 58.5% 
sponse. Referring to Table 1, it will be noted that, of these 585 re- 
ndents “1,000% salted” for Magazine A, only 47.3% mentioned Mag- 
“azine A. Thus, on the basis of the original questionnaire, Magazine A 
‘tanked third in readership. ; 
The Second Questionnaire. Of the 291 who had failed to mention 
ine A in their replies to the first questionnaire and who were ad, 
sed with Questionnaire No. 2, 222 replied to the second questionnaire, 
a 76.2% response. Referring to Table 1, it will be noticed that, of 
222 respondents, 91.4% replied “YES” to the question, “Do you 
Magazine A?” A SAN 
did so many of these respondents fail to mention Magazine A in 
ng to the first questionnaire? How could so many “change their 
” in replying to the second questionnaire? Surprisingly, 150 of 
replying to Questionnaire No. 2 made some sort of comment— 


588 Ht. P. Longstaf and G. P. Layborun 


Table 1 
Response to Each of the Three Questionnaires 


Original “Orthodox” Questionnaire: “What magazines do you read?” 


1st Questionnaire 


Magazine A Magazine B Magazine C 

Replies No. Per Cent No. Per Cent No. Per Cent 
Mentioning as Read 277 47.3 309 52.8 295 50.4 
Not Mentioning as Read 308 52.7 276 47.2 290 49.6 


585 100.0 585 100.0 585 100.0 
Follow-up Questionnaires: “Do you read this magazine?” 


2nd Questionnaire 3rd Questionnaire 

Magazine A Magazine B Magazine C 
Replies No. Per Cent No. Per Cent No. Per Cent 

YES 208 91.4 23 41.0 30 52.6 
“Occasionally” 5 2.2 6 10.7 5 8.8 
NO 13 5.9 27 48.3 21 36.8 
No Reply 1 5 0 0.0 1 18 
222 100.0 56 100.0 57 100.0 


surprisingly, because such comment, while suggested, was not specifically 
asked for. This voluntary comment of these readers was perhaps the 

- most revealing part of the response to the second questionnaire. Some 
typical comments follow: 


“I have received Magazine A for nine years and have asked for more 
ormation numberless times.” f 
“Overlooked this originally—I know of at least 15 other men in our 
organization who read Magazine A regularly.” . 
‘Sorry to have overlooked Magazine A as this is really one of my favorite 
magazines along with Magazine Æ and really like it very much.” 


The Third Questionnaire. Of the 74 who had failed to mention 
Magazine B in their replies to the first questionnaire and who were ad- 
dressed with Questionnaire No. 3, 56 replied to the third quetsionnaire,— 
2 75.67 response; and, of the 74 who had failed to mention Magazine C 
in their replies to the first questionnaire and who were addressed with 
Questionnaire No. 3, 57 replied to the third questionnaire-—a 77.0% 
response. Again referring to Table 1, it will be observed that, of those 
replying to Questionnaire No. 3, 41.0% of those who had failed to mention 


What Do Readership Studies Really Prove? 589 


zine B in the first questionnaire replied “YES” to the question, 
Jo you read Magazine B?”; and 52.6% of those who had failed to 
ion Magazine C in the first questionnaire replied “YES” to the 
ion, “Do you read Magazine C?” 


Effect of Combining Results 


We have just seen that the percentages of “YES” replies to the ques- 
, “Do you read this magazine?” asked in the follow-up questionnaires, 
91.4, 41.0, and 52.6, respectively, for Magazine A, Magazine B, and 
gazine C. Taking these percentages of each publication’s “replies- 
o-mention”’ in response to the original questionnaire, we note the 
g: 


Number Failing to 
Percentage Saying They Mention in Re- 
Read in Response to sponse to Additional 
Follow-up Questionnaire 
91.4 308 282 
41.0 276 113 
52.6 290 153 


Then, adding the results of the follow-up questionnaires to the results 
he original questionnaire, we arrive at the following: 


Number of Readers Number of Additional Readers 


from Original from Follow-up Total 

EA Questionnaires Readers 
277 282 559 
309 113 422 
295 153 488 


Thus, contrasting the results of the original “orthodox” study with 
final figures obtained from all three questionnaires, we note the 
ing inversion in the ranks of the three publications: 


Results from Origi ` Figures from All Three 
ults fro: ae 


Orthodox Stu 
Readers Rank Readers Rank 
Magazine A 277 3rd 559 lst 
Magazine C 295 2nd 448 2nd 
Magazine B 309 Ist 422 8rd 


Tt should be noted that these final figures are not presented as accurate 

ments of the relative readerships of these three magazines. 
, it is believed that these final figures, contrasted with the figures of 
st “orthodox” questionnaire, give evidence of the fallacies inherent 
ich “orthodox” readership studies. 


590 H. P. Longstaff and G.P. Laybourn 


Interpretations Necessitated by the Comments 


What is perhaps the most revealing part of this entire investigation 
„is found in the comments which were made by those replying to the 
follow-up questionnaires as to their readership of, and their opinions of, 
these publications.’ : i 

Comparison of Comments of Those Replying to Follow-up Questionnaires, 
The comments accompanying the replies in response to Questionnaire 
No. 2 (regarding Magazine A) and to Questionnaire No. 3 (regarding 
Magazine B or Magazine C) may be categorized as follows: (1) Favorable 
—i.e., the comment definitely reveals the respondent’s approval of the 
publication and/or his actual readership of it; (2) Negative—i.e., the 
comment definitely shows that the respondent either does not actually 
read the publication regularly or does not feel that it is valuable to him 
and (3) Non-committal—i.e., the comment tells nothing definite as to the 
Tespondent’s actual readership or his personal opinion of the magazine's 
value to him. 

Table 2 
Comparison of Comments Accompanying Replies 
in Response to Follow-up Questionnaires 


j In response to: 3 
2nd Questionnaire 3rd Questionnaire 


Magazine A Magazine B Magazine C 
Replies No. Per Cent No. Per Cent No. Per Cent 
With Comments 150 67.5 27 48.2 44 771 
Without Comments 72 32.5 29 51.8 13 22.9 
Total Replies 222 100.0 56 100.0 67 100.0 
Comments No. Per Cent No. PerCent No. Per Cent 
Favorable 102 68.0 6 22.2 10 22,7 
Negative 11 73 18 66.6 29 65.9 
Non-committal 37 24.7 3 11.2 5 11.4 
Total Comments 150 100.0 27 100.0 a 100.0 


Referring to Table 2, it will be seen that Magazine A elicited a strik- 
ingly larger percentage of “favorable” comments and a strikingly smaller 
percentage of “negative” comments than did either Mazagine B or 
Magazine C. From this tabulation there seems to be strong evidence, 
first, in the case of Magazine A, that actual comments show far greater 


* A complete tabulation of all comments is presented in the original report, We made 
a reader survey, pp. 22-43, 


What Do Readership Studies Really Prove? 591 


ive readership than mentions on the original questionnaire showed; 
ond, in the cases of both Magazine B and Magazine C, that actual 
iments show far lower active readership than both mentions on the 

al questionnaire and votes on the follow-up questionnaire showed. 
hus, it would seem that, in a readership study in which a publication’s 
ership depended upon the number of mentions it receives, there would 
marked tendency for the readership of Magazine A to be under- 
ippraised and the readerships of Magazine B and Magazine C to be over- 


has been sadly neglected be- 


“My recent reading, I am ashamed to say, 
have temporarily passed up 


cause of a heavy work load. Consequently if 

| this magazine. In my opinion it is a good 0. 

“I read this magazine occasionally but have never inquired about items 

of interest in its advertisements. It takes too much time to read the 

agazine and carefully go through its advertisements,” r \ 
‘Not too helpful in my line of work but have obtained some information 


_ from it at various times.” 


If we accept the principle that such negative comments disquality 
“YES” votes, then, for each of the publications, we may deduct from 
e number of respondents voting “YES” when asked in the follow-up 
tionnaire “Do you read this magazine?” the number of such respond- _ 
‘ents whose comments disqualify their “YES” votes, yielding the number 
ercentage of “YES” votes corrected for “disqualifying negative 
ents.” Following the method of calculation outlined above under 
fect of Combining Results,” if we take these corrected percentages of 
ch publication’s “‘replies-failing-to-mention” in response to the original 
estionnaire, we may obtain the number of additional readers from the 
w-up questionnnaires corrected for disqualifying negative comments. 
hen these are added to,the number of readers obtained in response to the 
questionnaire, we arrive at the total number of readers obtained 
response to the original questionnaire corrected for both votes and dis- 
ing negative comments in response to the follow-up question- 
These figures together with the ranks of each of the three 
ines are presented in the last two columns of Table 3, which com- 
the relative standings of the three publications on the basis of the 


ee methods of analysis employed. 


592 H. P. Longstaff and G. P. Laybourn 


Table 3 


Comparison of the Relative Standings of Three Publications on 
the Basis of Three Different Readership-Study Techniques 


Relative Standings on Basis of: 

Number of Readers Number of Readers 
Based on “mentions” * Corrected for Votes + 
Obtained in Response Number of Readers and “disqualifyin; 
to Original Question- Corrected for “votes”? negative comments” § 
naire Employing “‘or- Obtained in Response Obtained in Response 

thodox” Badaki to Follow-up Question- to Follow-up Ques- 
naire 


Study Techniques tionnaire 
Publication Readers Rank Readers Rank Readers Rank 
Magazine B 309 Ist 422 3rd 398 3rd 
Magazine C 295 2nd 448 2nd 412 2nd 
Magazine A 277 8rd 559 lst 550 Ast 
1 Mention = the naming of a magazine in reply to the question, “What magazines do 
you read?” 


2 Vote = “YES” response to the question, “Do you read this magazine?” 
3 Disqualifying negative comment = comment which indicates that a respondent who 
replied “YES” to the question, “Do you read this magazine?” actually does not read it. 


Tt should be noted that none of the figures contained in Table 3 are 
presented as accurate measurements of the relative readerships of these 
publications. These contrasting figures, however, do suggest that the 
further these studies are carried, the greater the discrepancy between the 
original “orthodox” technique results and the final figures. 


What Do Readership Studies Really Prove? 


The variations that are revealed in these figures would seem to lay 
down a challenge to the commonly accepted belief that one can measure 
readership by asking “What magazines do you read?” Both the “votes” 
and the comments obtained in the follow-up questionnaires of this in- 
vestigation reveal the influences of the human tendencies to say. we read 
what we feel we are expected to read, to boast of what we read beyond 
what we actually do read, and to protect ourselves from possible adverse 
criticism by stating that we keep up with what we think is accepted as 
aoduired reading.” It seems apparent, therefore, that asking people 

What do you read?” may measure the relative effectiveness of pus- 
lishers’ promotion, publicity, and propaganda over many years, but does 
not necessarily measure readership. 


Summary 


1. In order to discover to what extent the results of “orthodox” 
readership studies are dependable, an attempt was made to compare the 


What Do Readership Studies Really Prove? 593 


ative readership standings of a number of publications as determined 
three different readership-study techniques. 
_ 2, In the “orthodox” type of study employing a questionnaire which 
asked “What magazines do you read?” only 47.3% of those replying 
“mentioned Magazine A even though practically everyone to whom the 
questionnaire had been sent was a “known reader” of Magazine A. On 
the basis of this original questionnaire, Magazine A ranked third in 
: ‘readership, Magazine C ranked second, and Magazine B ranked first. 
___ 8, When a follow-up questionnaire was sent to those who, in replying 
tothe original questionnaire, had failed to mention Magazine A, Magazine 
_ B, or Magazine C, asking “Do you read this magazine (Magasine A, 
agazine B, or Magazine C)?” the relative readership standings obtained 
in the original questionnaire were reversed. 
r 4. When the comments made on the follow-up questionnaires were, 
-taken into account, the readerships of the three publications in question’ 
_ were changed still further. 
5. It would seem apparent from this investigation, therefore, that the 
burden of proof rests on those who conduct “orthodox” readership studies 
f to prove that their figures are measuring actual readership. 


Received April 11, 1949. 


Psychological Factors in Instrument Reading. II. The 
Accuracy of Pointer Position Interpolation as a 
Function of the Distance Between Scale 
Marks and Illumination * 


Walter F. Grether 
Aero Medical Laboratory, Wright-Patterson Air Force Base, Dayton, Ohio 
and 


A. C. Williams, Jr. 
University of Tlinois 


The reader of instruments is normally expected to obtain values of 
greater precision than the graduations placed upon the instrument scale. 
To accomplish this he must interpolate, that is, estimate the relative 
distance of the pointer from the two scale marks between which it falls 
and assign an appropriate value to this position. The accuracy with 
which this can be done obviously limits the precision with which any given 
seale can be read. The accuracy of such interpolation, moreover, will be 
influenced by several variables in the scale design and the conditions of 
reading. For a prediction of reading precision obtainable with different 
instrument designs under various conditions of viewing the effect of the 
significant variables must be known. In the present experiment the 
accuracy of pointer position interpolation was studied as a function of 
(a) diameter of the dial; (b) angular separation of the scale divisions; 
and (c) simulated day- versus night-viewing conditions. It will be shown 
in the presentation of the results that the first two variables can be re- 
duced to a single one, namely, the length of the are (in visual angle or 
inches) between scale marks. i 

A problem in scale design to which the present investigation is partic- 
ularly relevant is concerned with the question of how finely a scale should 
be divided in order to provide maximum reading accuracy. In an in- 
vestigation by Loucks (5) the legibility of tachometer dials was invest 


* This experiment was carried out at the University of Illinois by Dr. A. C. Williams, 
Jr., under a “dollar-a-year” contract with the USAF Air Materiel Command. Dr. W. r 
Grether proposed the study, designed and procured the necessary dials, and prepar® 
the present report. The basic data have been presented previously in Army Air Foro 
Aviation Psychology Program Research Report No. 19, Chapter 7, and in USAF Air 
Materiel Command Memorandum Report No. TSEAA-694-1. 


594 


Psychological Factors in Instrument Reading 595 


using rather short exposure (0.75 sec.). For three dials with 
duations of 100, 50, and 20 RPM respectively the percentage of read- 
errors increased as the value and size of the graduations decreased. 
om this finding it might be concluded that placing the graduations 
ther close together will decrease rather than increase reading accuracy. 
e findings of Kappauf, Smith, and Bray (3), and Kappauf and Smith 
, in experiments where the exposure interval was not limited, disagree 
With those of Loucks. In their experiments dials graduated in units 
ve greater reading accuracy and speed than dials of the same size but 
uated in 5- or 10-unit steps. However, the superiority of l-unit 
r 5-unit graduations was rather small and not at all in proportion to 
creased number of graduation marks. These latter results are in 
eement with those of an investigation by Grether (1) on the reading of 
ck dials, With one minute as the criterion of reading accuracy, dials 
ith 1-minute graduations gave higher reading accuracy than similar 
s with only 5-minute scale marks. y 

possible explanation can be offered for the discrepancy between the 
ndings of Loucks (5) and later investigators. It is quite probable that 
‘as the number of scale marks is increased more eye fixations are required 
make each reading. By limiting the exposure time and consequently 
number of eye fixations Loucks may have favored those dials with 
more widely spaced graduations. 

In the study of scale designs it is helpful to distinguish between two 
neral types of errors encountered in dial reading studies. There are 
t the precision errors or errors of interpolation. These can never 

in magnitude the value of the smallest interval on the scale, The 
type may be called comprehension of interpretation errors, in which 
incorrect value is assigned to the graduation mark against which the 
ter is being read. Comprehension errors are frequently very large 
are usually some multiple of the minor, intermediate, or major scale 
ns. In a study by Grether (2) of altimeter reading, for example, 
; of the errors were of this latter’sort, with errors of 1000 feet being 
ularly common. [It is important to recognize that many of the dial 
g studies up to the present have been concerned only with the 
polation type of errors, when in actuality the larger comprehension 

are far more serious in practical instrument reading situations. It 
quite possible that the presence of a large number of graduation marks 
dial may greatly increase the probability of large comprehension 

Tors and thereby nullify the precision which a finely graduated scale 


596 Walter F. Grether and A. C. Williams, Jr. 


physical length of the scale, the range of values to be covered, and the 
desired accuracy of reading are fixed by the particular application. 
Based upon these requirements the designer must then select values 
(usually 1, 2, 5, or decimal multiples of these) for his scale increments 
which will give him reasonable spacing between graduation marks, 

The aim of the present study was to provide the instrument designer 
with data from which to predict how the accuracy of readings will be 
affected by the physical length of the interval into which he sub-divides a 
scale. Measurements were made under two lighting conditions compar- 
able to those under which aircraft instruments are viewed. Emphasis in 
this study was placed on interpolation errors. ‘The more complex com- 
prehension type of errors were recorded but were relatively few in number 
and were not subjected to analysis. 


Apparatus 


For the purpose of this experiment a series of 16 simulated instru- 
ment dials was prepared. A sample dial and pointer are shown in Figure 
1. Four sizes of dials were used as follows: 1, 174, 234, and 4 inches in 
diameter. The particular dimensions of the two intermediate sizes were 
chosen to duplicate standard aircraft instruments. Each size of dial 
was produced with four different graduation intervals, defined in terms 
of the angular separation between scale marks, as follows: 5, 10, 20, and 5 
40 degrees. Except for the variations in diameter and size of graduation 
intervals, all dials were identical. The intermediate graduation marks 
were ¥ inch in length and approximately 0.02 inches in width. The 
major graduation marks at each end of the scale were the same width but 
% inch in length. The numerals on all dials were 1% inch in height. 
All pointers were 3/32 inch in width and of such a length that the tip 
reached to the inner edge of the shortest graduation marks. All dials 
covered a range of from 0 to 50 units as shown in Figure 1, with graduation 
marks only at the 0, 10, 20, 30, 40, and 50 positions, and numerals only 
at 0 and 50. These dials were engraved on brass plates, which were then 
painted a flat black and the engraved markings filled with yellow fluoresc- 
ing paint (pale yellow in daylight) as used on the latest type of USAF 
instruments. 

The experimental dials were presented singly in a panel opening 30 
inches from and perpendicular to the subject’seyes. Daylight conditions 
were simulated with a fluorescent type daylight lamp which provided an 
illumination of 45 foot-candles at the panel opening. For simulation 0 
night conditions the subject’s room was completely darkened and the dial 
illuminated with a standard C-5 ultra-violet aircraft instrument pane 
light operating at maximum intensity. No means were available for 


Psychological Factors in Instrument Reading 597 


E a quantitative measurement of the brightness of the scale mark- 
ings under ultra-violet illumination. Covering the opening in which the 
experimental dials were presented was a mechanical shutter operated by 
the experimenter. 

' 


5104-E 


Frc. 1. Sample dial and pointer (174 inch diameter and 20 degree 
angular separation between scale marks). 


_ On the experimenter’s side of the test panel was a carriage on which 
four of the dials could be mounted side by side. This carriage rode open 
two horizontal tracks parallel to the screen. To present any R of i , 
dials the experimenter moved the carriage 80 that the desired di wou 
Sppear in the panel opening. At the experimenter s side of the yd 
Were four master setting dials 5 inches in diameter. On each pi io 

ials was a pointer connected to the same shaft as the pointer on t s : 
to be read by the subject. On the experimenter’s dials were c Pa 
Spaced graduations which made possible accurate settings to one-tenth oi 

@ space between graduations on the subject’s dials. 


598 Walter F. Grether and A. C. Williams, Jr. 


Also provided at the experimenter’s station was a lever for manual 
operation of the shutter used to expose the dial to the subject. This 
lever was used also to operate an electric timer through a suitable switch. 
Thus, the timer indicated the time during which the shutter remained 
open. Since the experimenter closed the shutter as soon as the dial read- 
ing had been completed, the reading on the clock gave a crude measure of 
the reaction time on each test trial. Several other methods of measuring 
reaction time were tried but found to be unsatisfactory. 

Eighty male college students were used as subjects in this experiment. 
Only men with 20-20 binocular vision (corrected or uncorrected) were 
accepted. The subjects were seated in a chair in front of the screen with 
their eyes 30 inches from the panel opening and with the line of sight per- 
pendicular to the panel opening in order to eliminate parallax. The sub- 
jects were divided into groups of 20, each group being tested on a set of 
four dials. The four dials included one of each diameter and one of each 
graduation interval. Each subject was given a total of 80 trials, equally 
divided among the four dials in a random sequence. Of each group of 20 
subjects, 10 were tested under simulated daylight conditions and the re- 
maining 10 under simulated night conditions. 

A variety of dial settings were chosen so as to represent all portions of 


the dial from 0 to 50. The actual numbers to be read were the same for — 


all dials although the order of presentation was randomized. The sub- 
jects were instructed to read the dials as quickly and accurately as possi- 
ble to the nearest whole number. As can be seen in Figure 1, the reading 
to the nearest whole number required estimation to the nearest one-tenth 
of the distance between graduations. 

On each trial the experimenter set the pointer of the dial to be pre- 
sented, then opened the shutter and waited for the subject’s verbal re- 
sponse, following which the shutter was closed and the subject’s reading 
and the clock score recorded. 

Results 


The experimental design resulted in 200 readings on each of the 16 
specific dials under each of the lighting conditions. For each reading 
both error and time data were obtained. The error data consisted of the 
deviations of the readings from the actual settings. These deviations 
could be either negative or positive and increased in step intervals of on; 
or one-tenth of the space between graduation marks. For purposes © 
analysis, however, error distributions were made without regard to sig 


Since the distributions of these errors, and also response times, were CO” | 


_ siderably skewed, with the modal error being zero for many of the dials, 
the statistical treatment presented in this report is limited to meding 
and 75th percentiles. Means were computed for all the data and foun 


Psychological Factors in Instrument Reading 599 


io present the same general picture as the mediang, but the values were 
ated because of the skewness. 

A summary of the error data for the 16 dials is shown in Table 1, In 
the third column, it will be noted, the two variables of dial diameter and 
angular spacing of the divisions have been reduced to a single variable, 
namely, the length of graduation interval defined as the are between the 
inner ends of the shortest scale marks. This value can be described also 


__ as the distance the pointer tip must travel between adjacent graduations. 


Table 1 


| Summary of Data on Accuracy of Pointer Interpolation as Function 
of Dial Diameter and Spacing of Scale Divisions 


Median Median Median centile 

Error Error Error Error, Median 
Dial Angular Length of Daylight, Night, Combined, Combined, _ Error 
Diameter, Spacing, Inner Arc, Ao %of of _% of Combined, 
= inches egrees inches* interval interval interval * interval degrees 


1 5 .032 21.8 31.0 26.4 45.5 1.32 
1 10 .065 17.8 18.8 18.3 28.5 1.83 
1 20 130 14.3 14.9 14.6 21.2 2.92 
1 40 261 13.2 12.1 12.6 17.6 5.04 
1% 5 .070 20.0 19.4 19.7 31.8 0.99 
1% 10 141 11.2 14.2 12.8 18.5 1.28 
1% 20 .238 12.4 10.3 114. 17.1 2.28 
1% 40 567 7.8 8.1 8.0 13.9 3.20 
5 109 15.4 15.7 15.5 21.2 0.78 

10 218 12.0 9.3 10.6 16.5 1.06 

20 436 9.4 9.1 9.3 15.2 1.86 

40 872 8.1 8.1 8.1 14.1 3.24 

5 163 14.9 13.9 14.4 20.0 0.72 

10 327 9.7 9.0 9.3 15.2 0.93 

20 654 8.8 7.9 8.3 14.3 1.66 

40 1.309 9.8 9.1 9.4 15.0 3.76 


* For conversion to minutes of visual angle multiply by 114.7. 


_ This length of the inner arc is presented in inches with the multiplying 
- factor provided for conversion to minutes of visual angle for the 30-inch 
ewing distance. A comparison of the columns of median errors for 
ylight and night conditions in Table 1 reveals no consistent difference 
etween these two lighting conditions. Only for the dial with the most 
Closely spaced divisions does the performance under daylight appear to be 
erior. For this reason it seemed safe to combine the two sets of error 


data in the remaining columns of the table. 


600 Walter F. Grether and A. C. Williams, Jr. 


Some of the most, important findings contained in Table 1 are pre- 
sented graphically in Figures 2and3. With length of graduation interval 
along the base line, the median and 75th percentile errors are plotted as 
per cent of the interval in Figure 2, and as absolute values in Figure 3. 


Tth Percentile 


BREOR —— PERCENT OF INTERVAL 


50th Percentile (Median) 


aoa 5S 6 FE & & BRE BRS 


0.7 Q8 0.9 1.0 1.1 1.2 1.3 
LENGTH OF GRADUATION INTERVAL ~- INCHES 
5104-A 
Fia. 2, Relative error of interpolation as a function 
of length of graduation interval. 


Figure 2 will be recognized as a typical Weber function in which threshold 
ratios (DI/T) are plotted as a function of the stimulus intensity (I). In 
this figure, it will be noted, the relative accuracy of interpolation is very 
nearly constant for graduation intervals above 0.5 inch, with a slight rise 
for the largest interval. 


Psychological Factors in Instrument Reading 601 


+13 


MEDIAN INTERPOLATION ERROR -- INCHES 


DL o.z 0.3 Och 0.5 O6 Os? 0e8 069 2,0 1l 1.2 1,3 LA 
LENGTH OF GRADUATION INTERVAL -- INCHES SMa A 


Fra. 3. Absolute error of interpolation as a function 
of length of graduation interval. 


f Table 2 
< Median Time for Interpolation of Pointer Position as a Function 
ks of Dial Diameter and Spacing of Scale Divisions 
1 nal 


aH Graduation Interval 

N Dial Oe 

i t diameter, 5° 10° 20° 40° 
Be inches Seconds per reading for daylight conditions 

1 1.98 1.78 1.91 1.84 

1% 1.73 1.80 1.86 1.76 

in 234 1.83 1.85 1.73 1.77 

4 4 1.87 1.75 1.68 1.90 

Bie Seconds per reading for night conditions 

2.13 

2.10 

2.06 


602 Walter F. Grether and A. C. Williams, Jr. 


In Figure 3 the absolute values for interpolation thresholds fall very 
nearly on a straight line, which if extrapolated to zero intervals would 
intercept the ordinate at approximately 0.006 inch. It is apparent from 
this curve that no limit had been reached for absolute accuracy of inter- 
polation in this experiment. The excellence with which the various 
points fit the curves in these two figures indicates that the combination of 
dial diameter and angular separation into the single variable of length of 
graduation interval was justified. : 

The results of the measurements of response time in this experiment 
are summarized in Table 2. It is apparent that there are no consistent 
relationships between response times and the dial dimensions, although 
the method of measuring response time may have been too crude to demon- 
strate minor relationships that might have been present. On the other 
hand the response times for the night viewing conditions are consistently 
higher than under the daylight conditions. It is quite possible that this 
latter finding was an artifact resulting from a slight delay between open- 
ing of the shutter and the fluorescing of the scale marks. 


Discussion 

Effect of Illumination on Interpolation Accuracy. It is apparent from 
Table 1 that there was no difference in accuracy of dial reading for day 
and night conditions except for the dial with the most closely spaced 

, graduations. This finding is in general agreement with results obtained 
by Spragg and Rock (6) in an investigation of the accuracy of interpola- 
tion as a function of illumination. These investigators found accuracy 
to be almost constant down to a scale mark brightness of 0.022 foot- 
lambert. In the present experiment the brightness of the scale markings 
under the simulated night conditions is estimated to have been consider- 
ably above the 0.022 foot-lambert value below which Spragg and Rock 
found a marked loss in accuracy of interpolation. 

Effect of Separation between Scale Marks on Speed of Reading. It is 
noteworthy that in this experiment no relationship was found between the 
space between graduation marks and speed of reading, although it is 
admitted that the method of measuring speed of reading lacked precision. 
Tn experiments conducted at Princeton University reading time increased 
as the space between marks was decreased, but in all cases there were 
changes in other variables which could have caused the changes in read- 
ing speed. In the first experiment by Kappauf, Smith, and Bray (3) 
the reduction in space between scale marks was accompanied by reductions 
in all other dial dimensions. In the second experiment by Kappauf and 
Smith (4) reduction of the space between marks was accompanied in some 
cases by reduction in all other dimensions, in other cases by an increase 


E Psychological Factors in Instrument Reading 603 


in total range of values covered by the scale. The question of whether 
or not the separation between scale marks in isolation has any effect on 
“speed of dial reading does not appear to have been answered definitely. 
Effect of Dial Dimensions on Relative and Absolute Accuracy of Inter- 
K polation. In discussing the findings regarding accuracy of interpolation 
` the distinction must be constantly kept in mind between accuracy relative 
J to the interpolation space, and accuracy in absolute units such as degrees 
or inches. It is apparent from Fig 2 that there is scarcely any useful 
f gain in relative accuracy of interpolation as the graduation intervals are 
increased beyond 0.25 inch. On the other hand as the intervals are rè- 
_ duced below this value relative accuracy falls off very rapidly. Approxi- 
~ mately one-fourth (0.25) to one-half (0.50) inch would therefore seem to 
be an optimum value for graduation intervals from the standpoint of 
© relative accuracy. : 
i For maximum accuracy in an absolute sense it would appear from 
Figure 3 that the optimum graduation interval, if there is such, is below 
_ the range covered by the present experiment. The data in Figure 3 
suggest that the absolute value of interpolation errors might continue to 
decrease with decreases in graduation interval until the limit of visual 
acuity is reached. If this is true the limit of visual acuity would deter- 
mine the optimum graduation interval for maximum accuracy of dial 
reading. The data of Loucks (5) and Kappauf, Smith, and Bray (3) 
suggest that as the distance between graduation marks is decreased, there 
_ is an increasing tendency to make comprehension errors, that is, assign 
‘the wrong values to scale marks. Also, Kappauf and Smith (4) have 
found that increasing the total number of marks on the scale increases the 
time required for reading. It would seem, therefore, that there is no 
easy answer to the problem of what is the optimum interval size for 
-instrument scales, but that the optimum interval will vary with reading 
criteria. 
Summary 
Measurements were made of the accuracy of interpolating pointer 
position between scale marks as a function of dial diameter and the 
angular spacing between divisions. Subjects were required to estimate 
- the pointer position to within one-tenth the space between graduations: 
The experimental dials were painted with yellow fluorescing paint on a 
black background and were read under simulated daylight (45 foot- 
candles) and night (ultra-violet) illumination conditions. The major 
"results of this investigation may be summarized as follows: 
1. Dial diameter and angular spacing of the scale marks could be 


combined into the single variable of length of graduation interval. 


2. The relative error of interpolation decreased as the length of the 


604 Walter F. Grether and A. C. Williams, Jr. 


graduation interval increased up to approximately 0.5 inch, and was very 
nearly constant at higher intervals (see Figure 2). 

3. The absolute error of interpolation increased very nearly as a 
linear function of the length of the graduation interval (see Figure 3). 
If there is an optimum interval for absolute accuracy it would appear to be 
below the interval lengths used in this study. 

4, Except in the case of the most closely spaced divisions the accuracy 
of interpolation was independent of the two illumination conditions. 

5. The speed of dial reading was not systematically related to either 
dial diameter or angular spacing of the divisions, although the measure- 
ments were admittedly crude. Slower reading under the simulated night 
(ultra-violet) lighting conditions was probably due in part to delay in 
fluorescence of the dial markings. - 


‘Received March 18, 1949. 
References 


1. Grether, W. F. Factors in the design of clock dials which affect speed and accuracy 
of readings in the 2400-hour time system. J. appl. Psychol., 1948, 32, 159-169. 

2. Grether, W. F. Psychological factors in instrument reading. I. The design of 
long-scale indicators for speed and accuracy of quantitative readings. J. appl. 
Psychol., 1949, 33, 363-372. 

3. Kappauf, W. E., Smith, W. M., and Bray, C. W. Design of instrument dials for mazi- 
mum legibility. I. Development of methodology and some preliminary results. 
USAF Air Materiel Command Memorandum Report No. TSEAA-694-1L, 20 
October 1947, 

4, Kappauf, W. E., and Smith, W. M. Design of instrument dials for maximum legibil- 
ity. II. A preliminary experiment on dial size and graduation. USAF Air 
Materiel Command Memorandum Report No. MCREXD-694-1N, 12 July 1948. 

5. Loucks, R. B. Legibility of aircraft instrument dials: The relative legibility of tachom- 
oh dials. AAF School of Aviation Medicine, Project No. 265, Report No. 1, 

6, Spragg, S. D. S., and Rock, M. L. Dial reading performance as related to illumination 
variables. I. Intensity. USAF Air Materiel Command Memorandum Report 
No. MCREXD-694-21, 1 October 1948. 


Identification of Cola Beverages. III. A Final Study * 


N. H. Pronko and J. W. Bowles, Jr. 
Wichita University 


d In a preliminary study, the authors (2) found that when four Cola 
_ beverages (Coca Cola, Pepsi Cola, Royal Crown Cola and Vess Cola) 
were presented to Ss to identify, the identification responses of those Ss 
tended to cluster around the three brand names, Coca Cola, Pepsi Cola 
and RC Cola, the fourth sample being consistently labelled with one of 
the three names previously employed. This occurred both when Ss were 
given four samples each of different Cola beverages or four samples of the 
same Cola. 

It was therefore decided to employ only three different Colas on the 
hypothesis that since our Ss apparently did not discriminate the Colas on 
a gustatory basis, their identification responses would be distributed 
among the three brands in an order approximating chance. Such a pro- 
cedure was carried out in Experiment II (1) which showed that whether 
Ss were given three different beverages or the same beverage three differ- 
ent times, their identifications were not essentially different in the two 
cases, It was concluded that when Ss were asked to identify the three 
-leading brands of Cola, they might just as well have drawn their names 
out of a hat. 

; Another experiment suggested itself as the logical outcome. If Coca 
Cola, Pepsi Cola and RC Cola identification responses were randomly 
distributed when these beverages were actually given serially or otherwise, 
what would the distribution of those responses be when some relatively 


“unknown Cola drinks were given instead of those three? Therefore, the 


present experiment utilized three brands of less well known or advertised 
and Spur Cola. The 


Cola. These were Hyde Park Cola, Kroger Cola, an + Cola, 
last beverage is now, for the first time, being distributed in this com- 
munity although none is as generally available as the three leading 
brands. ; 

Procedure 
the present study employed two groups of Ss, 


As in Experiment II, of i 
the most part, these were beginning 


96 in Part I and 60 in Part II. For 
students in Elementary Psychology. 
their appreciation to Mr. Fred Snyder, Mr. Donald 
heir assistance in the experiment reported here. fi 


605 


* The writers wish to express 
Synolds and Mr. Glen Allen for tl 


606 N. H. Pronko and J. W. Bowles, Jr. 


Part I. Each of 96 Ss was admitted individually into the experimental 
room and was invited to sit down. The following instructions were then 
read to him. i 

We would like to have you taste and identify some Cola drinks. You will 
be told in what order and when you are to drink them. After you have finished 
each sample, report your identification to E and take enough water from the 
paper cup to rinse your mouth well. 

A tray containing three one-oz. glasses of Hyde Park Cola, Kroger 
Cola and Spur Cola respectively were placed before the S. He was then 
told to drink the beverages labelled x, y, and z in the order indicated to 
him. Samplings were spaced about a minute apart, S’s name and other 
information being recorded in the interval between drinks, 

Order of presentation of beverages, determined preexperimentally, was 
such that the three stimuli appeared in the first; second and third position 
32 times. Such a counter-balanced order was used in order to preclude 
the operation of position effects or stimuli interactions orally. All 
beverages were at all times kept out of sight of Ss and were placed in a 
refrigerant maintained at approximately 5° C. 

Part IT. In part II 60 Ss were administered the same Cola drinks at 
each of three trials. Thus, 20 were given all Hyde Park Cola; 20, all 
Kroger Cola; and 20, all Spur Cola. 


: Results 

Table 1 shows the distribution of the identification responses of the 
96 Ss of part I who were given one-oz. samples each of Hyde Park Cola, 
Kroger ‘Cola and Spur Cola. The most conspicuous finding to be ob- 
served here is the total absence of correct identifications! Not ona single 
occasion were any of these beverages correctly named. As a matter of 
fact, it will be noted that the greater proportion of identification responses 
are to be found in the Coca Cola, Pepsi Cola and RC Cola columns with 
a sprinkling of namings under Seven Up, Dr. Pepper and Cleo Cola. 


Table 1 
Showing the Distribution of 288 Identification Responses when Each of the 96 Ss was 
Presented in Turn, but in Counterbalanced Order, with One 1 oz. Sample Each of 
Hyde Park, Kroger, and Spur Cola 


Frequency of Ss’ Identification Responses 


Brand of 


Dr. 
GivnS HP. K. $. C.C. Pep. RC. Zup Pep. Cleo. Other Total 
Hyde Park 0 0 0 39 OF xa ı ı O 4 96 
Kroger O EAER At. AE, TRA ap) 2 On | A Big oe 
Spur 0 0 0 34 33 2 2 1 1 3 96 
„ Total 0 0 0 10 99 @ 4 #2 2 10 288 


ce TS IN PS FS A ec 


Identification of Cola Beverages. TII 607 


Our Ss in this experiment show essentially the same behaviors as those 
of Experiment II who were actually given Coca Cola, Pepsi Cola and RC 
Cola. Our hypothesis is, therefore, substantiated and we must conclude 
that our Ss have not discriminated from among the varieties of Cola 
beverages employed in our series of studies. Instead, they have applied 
a readily available repertoire of naming reactions whose probable source 
is advertising, familiarity through actual contact and other forms of 
culturization. One more point should be noted. In the total of 288 
identification responses, observe that the three relatively less well known 
beverages here employed were identified as Coca Cola 103 times, as 
Pepsi Cola 99, and as RC Cola 68 times. Again, as in previous studies, 
Coca Cola and Pepsi Cola names are applied with greatest frequency, 
with RC trailing as before. 

Table 2 


Showing the Distribution of 180 Identification Responses when Each of 60 Ss was 
Presented with Three 1 oz. Glasses of Hither the Single Brand, 
Hyde Park, Kroger, or Spur Cola 


Frequency of Ss’ Identification Responses 


Eee of a 

overage r, 

Given K H.P. K. S$. C.C. Pep. R.C. 7up Pep. Cleo. Other Total 

Hyde Park 0 0 0 2 21 12 0 1 1 0 60 

Kroger 0 0 0 2 2 18 0 0 0 0 60 

Spur 0 0 0 26 19 9 0 1 2 3 60 
Total 0 0 0 73 65 34 0 2 3 3 180 


se, Table 2 shows distribution of the 180 identifications of our 60 Ss who 
were presented with three samples of the same Cola. The reader will note 
_ that whether our Ss get. three samples of Hyde Park, Kroger or Spur 
Gola, they nevertheless identify each of them most frequently as Coca 
Cola and Pepsi Cola and less frequently as RC Cola with the same sprin- 
Kling of other brands as before. ‘The pattern of total frequencies is the 
same as in all parts of this and other experiments with Coca Cola and 
| Pepsi Cola in the lead and RC as runner-up. We suggest that this con- 
" sistent pattern may reflect the relative efficacy of the advertising of these 
_ three brands. 
Summary 

A group of 96 Ss was asked to identify one-oz. samples of Hyde Park 
Cola, Kroger Cola and Spur Cola presented in counterbalanced order. 

1. There were no correct identifications. Instead, these beverages 
_ were identified 103 times as Coca Cola, 99 times as Pepsi Cola, 68 times 
_ as RC and a total of 18 times as some other soft drink. F 


608 N. H. Pronko and J. W. Bowles, Jr. 


2. Another group of 60 Ss, each of which was given three one-oz. 
samples of only one of the three Colas, showed an assortment of Coca 
Cola (73), Pepsi Cola (65) and RC Cola (34) identifications, again indicat- 
ing total absence of correct judgments. 

8. It is concluded that no matter whether our Ss are asked to identify 
three’ or four of the same or different Cola beverages, regardless of what 
particular brands are employed, their identification responses are m- 
evitably “Cola Cola, Pepsi Cola, RC Cola,” 

4. The seven brands of Cola beverages employed in our series of studies 
appear to have the same stimulus function for our Ss and may be said to 
be “equivalent stimuli.” 

Received June 14, 1948. 
References 
1, Bowles, J. W., Jr., and Pronko, N. H. Identification of Cola Beverages. II. A 
further study. J. appl. Psychol., 1948, 32, 559-564. 
2. Pronko, N. H., and Bowles, J. W., Jr. Identification of Cola beverages. I. First 
study. J. appl. Psychol., 1948, 32, 304-312. 


Book Reviews 


Samuel A. Stouffer et al. The American soldier: Volume I, Adjustment 
during army life; Volume II, Combat and its aftermath. Princeton: 
Princeton University Press, 1949. Pp. 600 each. Vol. 1 and 2, 
$18.50. Separate, $7.50. 

These are the first two of four volumes prepared and edited under the 
auspices of a Special Committee of the Social Science Research Council. 
The material reported is based upon data concerning soldiers’ attitudes 
collected by the Research Branch of the Information and Education 
Division, War Department, during the period, December, 1941, to August, 
1945. 

The basic tool which the Research Branch used was the questionnaire 
survey. A, typical survey went through the following stages: (1) a re- 
quest for information concerning attitudes or behavior of soldiers by some 
agency or branch of the army; (2) conferences between members of the 
Research Branch and the requesting agency in which the area to be in- 
vestigated was outlined; (3) a field trip by two or three members of the 
Research Branch in which casual conversations were held with enlisted 
men and officers on the topics to be covered in the survey; (4) preparation 
of a questionnaire based upon the experience gained in the field trip; (5) 
a pre-test of the questionnaire, followed by revisions; (6) final conferences 
with the requesting agency on the revised questionnaire; (7) clearance 
of the questionnaire with the Director of the Research Branch; (8) send- 
ing the questionnaire into the field. 

Samples for domestic surveys were selected in Washington; those for 
_ overseas surveys were selected in theater headquarters. In general the 
questionnaires were filled out directly by the men, personal interviews 
being limited to those in the lower educational groups. 

Over 200 such surveys were made by the Research Branch in which 
practices, preferences, and attitudes were investigated. The following 
partial list indicates something of the diversity of the areas surveyed: use 
of atabrine; practices related to trench foot; housing preferences of troops 
stationed in Alaska; popularity of various departments in Yank; radio 
" listening habits; attitudes toward the British, Chinese, and Germans; 
leisure-time activities; attitudes toward the WACS, civilians, Negroes, 
and MP’s; practices in relation to venereal disease prevention; attitudes 
toward medical care, hospitalization, war, savings and insurance plans, 
rotation and demobilization; reactions to army films, orientation and 


609 


610 Book Reviews 


indoctrination programs; attitudes toward service in the tropics, combat 
duty, job assignment, and training; reactions to German weapons; 


desires for educational courses; preferences in Christmas gifts; and . 


studies of map reading, word pronunciation, and psychoneuroses. 

Attitudes of special groups on various problems were surveyed. Some 
of the groups investigated were: Negroes, combat troops, Special Service 
officers, MP’s, WACS, psychiatric patients, combat infantrymen, para- 
troopers, AWOL’s, hospital nurses, combat veterans, Air Force returnees, 
combat veterans in hospitals, B-29 officers and enlisted men, and hospital 
patients. 

As a result of these surveys, some concrete actions were taken by the 
army. One example: the point system for demobilization was based upon 
a survey of the factors which enlisted men thought should be considered 
and the relative weights which they thought should be assigned to these 
factors. In addition, a monthly bulletin, What the Soldier Thinks, was 
published and circulated down to regimental commanders. Another 
bulletin, the Monthly Progress Report, was prepared for higher level 
officers. Both bulletins were intended to keep officers informed con- 
cerning the attitudes of enlisted men and officers on various problems of 
interest to the army. 

Although the first two volumes are primarily intended for the army, 
historians, social psychologists, and sociologists, they contain much useful 


information for the industrial psychologist and personnel workers. This _ 


is indicated by the following topics discussed in some detail in the various 
chapters: job satisfaction and job morale; leadership and social control; 
motivation; social mobility; adjustment; promotion; training. Most of 
these subjects are covered in Volume I which deals with the period of 
adjustment during army life. Volume II is primarily concerned with 
combat and its aftermath. Since many readers of this journal will be 
primarily interested in Volume I, it is unfortunate that separate indexes 
for the two volumes were not prepared. As it is, a single index is included 
in Volume II. 

It is worth mentioning also that the index does not contain entries for 
the pertinent problems of reliability and validity, although some attempts 
at validation of the questionnaires were made and are discussed in the 
volumes. Another point of minor irritation is the frequent forward 
reference in Volumes I and II to the material to be published in Volumes 
II and IV. 

Charts, graphs, and tables are to be found on almost every other page- 
Some of these are very elaborate and detailed. Others show the percent- 

_age of various groups responding to a single question in a particular man- 
ner. The basic statistic is the percentage, although a few correlation ¢0- 


Book Reviews 611 


- efficients are reported and an occasional chi square is to be found—usually 

in a footnote. This is not intended as a criticism. The reader should 
keep in mind that the work of the Research Branch was primarily directed 
toward the collection of information about attitudes which would be 
useful to the army planners of information, orientation, and educational 
programs. Their job was, as one of the authors calls it, a “practical. 
engineering” job. From the evidence reported in these two volumes, it 


was a job well done. 
Allen L. Edwards 
The University of Washington 


Planty, Earl G., McCord, William $., and Efferson, Carlos A. Training 
employees and managers for production and teamwork. New York: 
Ronald Press, 1948, pp xiii + 278. $5.00. } 

This book is characterized by breadth of viewpoint, purpose, scope, 
and principles. Training is conceived as a training of the whole em- 
ployee (on any job level) in attitudes, skills, and knowledges in such way 
that he fits into the framework of his job, his company, and the American 
system of free enterprise. The ambitious program undertaken by the 
authors could easily have led to a superficial or biased book. For the 
most part these potential dangers have been avoided. The authors indi- 
cate considerable evidence of sound professional training, practical ex- 
perience, common sense, and penetrating insight. The result is a book 
which will be of considerable value and interest to all persons directly or 
indirectly engaged in training in business and industry. 

The first two chapters cover what training is and what it willdo. The 
second part, consisting of the next six chapters, deals with organizing, 
installing, and administering & training program. Basic principles are 
given together with specific techniques and mechanics. The third part 
of the book contains thirteen how-to-do-it chapters devoted to training 
programs and methods. Chapters are included on teaching aids, teach- 
ing in business and industry, the training staff, the small company, help- 
ful resources, and training for specific groups such as new employees, 
supervisors, technical and professional, trade and semiskill, and office 
employees. f 
Tt was difficult for this reviewer to remain objective while appraising 
the book. ‘The enthusiasm aroused by the first few chapters kept in- 


creasing throughout the entire book except for the chapter on technical 


- and professional training. This chapter is of little value to the book or 
the reader except for the implication that such training has lagged far 


behind other types. A serious error of omission is the failure of the book 


to discuss training costs. Such a chapter would appear imperative in a 
book of this type. 


612 Book Reviews 


The authors have shown good judgment in striking a balance between 
various topics, have organized the material excellently, and have used 
well chosen examples and illustrations. They are to be commended for 
attacking some of the cliches currently held by training men; as, for 
example, the universal superiority of conferences as compared with 
lectures. The choice of words and mechanics of writing are considerably 
better than usually found in books of this type. These factors all add up 
‘to the best book on industrial training which this reviewer has ever seen. 


i Clifford E. Jurgensen 
Minneapolis Gas Company 


deFord, Miriam Allen. Psychologist wnretired, the life pattern of Lillian 

Martin. California: Stanford University Press, 1948. Pp. 130. 

$3.00. 

Lillian Martin, a most versatile psychologist, came from a distin- 
guished family, was self-supporting in her education, took her under- 
graduate work at Vassar, spent four years in graduate study of psychology 
in Germany in the 90’s, held a professorship of psychology in Stanford 
University from which she retired for age at 65 in 1916. She then turned 
to private practice and hospital service as a clinical psychologist and 
worked enthusiastically and with notable success for 27 years until she 
died at the age of 91. 

This book will be welcomed by lovers of biography as a model of 
literary style, by all clinical psychologists, and by all students of guidance 
in the art of aging. 

Winter Park, Carl E. Seashore 
Kaback, Goldie Ruth. Vocational personalities: an application of the 

Rorschach Group Method. New York, Columbia University: Bureau 

of Publications, 1946, Teachers College Contributions to Education 

No. 924. Pp, xii+ 116. $2.10. 

This study tests the hypothesis that vocational choice is in part a 
function of personality, the differences in personality responsible for a 
given occupational choice being measurable by the Rorschach Group 


Group Rorschachs were obtained from 75 accountants, 75 pharmacists, 

75 accountancy students and 75 pharmacy students. Brief life data 

sheets were filled in by the subjects. Mean ages were respectively 37, 

-37, 18, and 20 years, ranges being greater for the pharmacist groups than 
for their comparison accountant groups. Accountants averaged more 

education than pharmacists; students were similar to each other. Phar- 

macists had 16, accountants 13 years of work experience in their field. 

A miscellaneous vocational group numbering 108 was obtained from (ap- 


Book Reviews 618 


parently) a co-worker and responses were compared with the four eriterion 
groups studied. Reliability of Rorschach scoring was not reported, — 

Kaback finds that she can differentiate accountants from pharmacists 
by multiple correlation of .64 between the criterion and 24 Rorschach 
components. This r is too low for effective prediction. Accountants 
were significantly higher than pharmacists in W, d, R, P, O, Fo, M, FM, 
Fm, H+-A, Aobj., Pl and Obj. Pharmacists were significantly higher on 
Fk and At. This means (roughly) that accountants were more produc- 
tive (this factor alone seems to account for many of the differénoes shown) 
higher intellectually, more “socially sensitive—or aware of a need for 
social contact,” more cautious, creative and original and perhaps more 
widely diversified in interests than pharmacists, while the latter showed 
more “anxiety.” These differences, as the author points out, while 
statistically significant, are not of great practical value in vocational 
counseling. 

For students, a multiple r against the criterion of .653 was shown (a 
respectable r, but not high enough for good prediction.) Accountants in 
training were higher than pharmacists in training on R, P, H4-A, W, M, 
Fm, F, FC, Obj., D, Emb., and Arch; hence differences resemble the 
professional differences. 

The author concludes: all groups normal, reasonably well adjusted 
(accountancy students best), accountants more intelligent, pharmacists 
less mature and impulsive (for both students and professional persona); 
and students of both groups show selves as more practical than the pro- 
fessional groups. 

The author is to be commended for a type of study greatly needed. 
It seems however, first, that such a study can hardly attain highly reliable 
or valid results as long as socioeconomic pressures, family predilections, 
eto., determine the chosen career, often quite without regard to the par- 
sonal wishes or capabilities of the individual involved; and that, seoond, a 
Rorschach study is difficult if not impossible when scoring categories of 
the test are considered (as is almost ey) wine Oe et 
finements in statistics of percentages, ratios, and sub-relationships. 

The most ardent proponent of the Rorschach would probably not re- 
gard it as infallible when adapted as a group method and seored for 
means, using isolated scoring categories. In the hands of a competent 
technician, used to illuminate other data, it often seems to be of great 
use in vocational counseling; more “clinical” validations along the rather 
promising lines shown by the author may result from this interesting, 

carefully done, but practically not too useful study, the final conclusion 
of which is necessarily that there are all types of persons who enter or 
study in the fields of accountancy and pharmacy. Boyd McCandless 


The Ohio State University 


614 Book Reviews 


Eysenck, H.J. Dimensions of personality. ‘Cambridge, at the University 
‘Press; New York; The Macmillan Company, 1947. .Pp. 308. $5.00. 
This book is the result of a cooperative effort to discover the main 

dimensions of personality and to define them operationally by means of 

strict experimental, quantitative procedures. About forty distinct re- 
searches were carried out on some 10,000 normal and neurotic subjects by 

a research team of psychologists and psychiatrists at the Mill Hill 

Emergency Hospital, a war time neurosis center in London. Dr. Eysenck 

and his collaborators have turned to fruitful purpose the opportunities 

such a place affords for studying some of the main factors of personality 
and have shown how profitable may be the collaboration of psychologists 
and psychiatrists in pursuit of this aim. 

The first chapter entitled, Methods and Definitions describes the 
methodological conditions underlying the researches, and the working 
concepts and theories of temperament and personality structure. The 
second chapter, entitled, Assessments and Ratings presents a factorial 
analysis of 39 trait ratings, carried out by psychiatrists on 700 neurotic 
patients; also described is a study with questionnaires dealing with 
“neuroticism,” “persistence,” and “irritability.” Physique and Constitu- 
tion concerns studies performed on normal and neurotic subjects through 
the methods of factorial analysis wherein main dimensions of body con- 
figuration were isolated objectively; certain personality differences were 
also found with respect to body size. The fourth chapter, Ability and 
Efficiency presents discussions and data on intelligence and neurosis, 
“scatter,” level of aspiration, perseveration, and persistence. Suggestibil- 
ity and Hypnosis attempts to distinguish various types of suggestibility, 
to establish the relation between suggestibility and hypnosis, and to dis- 
cover personality correlates of suggestibility (its relation to hysteria and 
neuroticism). Chapter six, Appreciation and Expression offers experi- 
mental evidence on preference judgments. Synthesis and Conclusions, 
chapter seven, summarizes the researches indicating two main personality 
dimensions namely, “neuroticism” and “extraversion-introversion,” 
wherein the former is a general factor in the conative sphere, while the 
latter is a general factor in the affective sphere. 

While this book may presently contribute little that is useful to 
harrassed personnel psychologists, these specialists should nevertheless 
be familiar with the methods of investigation utilized. Much that is 
presented may influence our concepts of psychopathology in the future. 
However, it is recommended to all applied psychologists working in 
clinical fields and is “must” reading for the small corps of experimental 
clinical psychologists. 
Arthur Weider 
University of Louisville School of M. edicine, 

Dept. of Psychiatry, Div. of Psychosomatic Medicine 


New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to 
Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


The Harvard list of books in psychology. Gordon W. Allport et al. Cam- 
bridge: Harvard University Press, 1949. Pp. 77. $1.00. 

Differential psychology. Revised edition. Anne Anastasi and John P. 
Foley, Jr. New York: The Macmillan Co., 1949. Pp. 894. $5.50. 

Twilight in India. Gervee Baronte. New York: Philosophical Library, 
1949. Pp. 382. $3.75. i 

Orientation et selection professionnelles par Pexamen psychologique du 
caractere. Fr. Baumgarten. Paris, France: Dunod, Editeur, 92, 
Rue Bonaparte, 1949. Pp. 184. 680 fr. 

Social psychology of modern life. Revised edition. Steuart H. Britt. 
New York: Rinehart and Co., Inc., 1949. Pp. 703. $4.50. 

Labor relations and hindrances to full production. S. L. Burk. San 
Francisco: California Personnel Management Association, 1948. 


Pp. 18. $1.00. 
The neurosis of man. Trigant Burrow. New York: Harcourt, Brace, 


1949. Pp. 428. $7.50. 

Reading manual and workbook. Homer L. J. Carter and Dorothy J. 
McGinnis. New York: Prentice-Hall, Inc., 1949. Pp. 120. $1.75. 

Human relations in public administration. Alfred De Grazia. Chicago: 
Public Administration Service, 1949. Pp. 52. $1.50. 

Criteria for the life history. John Dollard. Reprint. New York: 
Peter Smith, Publisher, 1949. Pp. 294. $3.25. 

The supervisor's management guide. M. Joseph Dooher and. Vivienne 
Marquis, Editors. New York: American Management Association, 
1949. Pp. 190. $3.50. 

We human chemicals. Thomas Dreier. Scarsdale: Updegraff Press, 
Ltd., 1949. Pp. 122. $2.00. 

Mental testing. Florence L. Goodenough. New York: Rinehart and 


Co., Inc., 1949. Pp. 609. $5.00. j 
Readings in social security- William Haber and Wilbur Cohen. New 


York: Prentice-Hall, Inc., 1948. Pp. 634. $5.75. 
Psychosexual development in health and disease. Paul H. Hoch and 
Joseph Zubin, Editors. New York: Grune and Stratton, 1949. Pp. 


283. $4.50. 
y 615 


616 New Books, Monographs, and Pamphlets 

Group guidance, principles, techniques, and evaluation. Robert Hoppock. 
New York: McGraw-Hill Book Co., Inc., 1949. Pp. 393. $3.75. 

Motor performance and growth. A developmental study of static dynamo- 
metric growth. Harold E. Jones. Berkeley: University of California 
Press, 1949. Pp. 181. $3.00, cloth, $2.00, paper. 

Helping students find employment. Forrest H. Kirkpatrick et al. Wash- 
ington, D. C.: American Council on Education Studies, 1949. Pp. 37. 
$.75. 

Studies in human behavior. Merle Lawrence. Princeton: Princeton 
University Press, 1949. Pp. 184. $3.50. 

Explorations in personal adjustment—a workbook. George F. J. Lehner. 
New York: Prentice-Hall, Inc., 1949. Pp. 144. $1.50. 

Mental hygiene in public health. Paul V.Lemkau. New York: McGraw- 
Hill Book Co., Inc., 1949. Pp. 396. $4.50. 

The psychology of personal adjustment. Second edition. Fred McKinney. 
New York: Wiley and Sons, Inc., 1949. Pp. 752. $6.00. 

Selling performance and contentment in relation to school background. 
Albert C. Mossin. New York: Bureau of Publications, Teachers 
College, Columbia University, 1949. Pp. 166. $2.75. 

The pre-election polls of 1948. Frederick Mosteller et al. New York: 
Social Science Research Council, 1949. Pp.396. Paper, $2.50; Cloth, 
$3.00. 

The nature-nurture controversy. Nicholas Pastore. New York: Columbia 
University Press, 1949. Pp. 213. $3.25. 

Perception of symbol orientation and early reading success. Muriel 
Catherine Potter. New York: Bureau of Publications, Teachers 
College, Columbia University, 1949. Pp. 69. $2.10. 

Human relations in an expanding company. F. L. W. Richardson and 
Charles R. Walker. New Haven: Yale University Labor and Man- 
agement Center, 1948. Pp. 95. $1.50. 

Letters to my son. Dagobert D. Runes. New York: The Philosophical 
Library, 1949. Pp. 92. $2.75. 

Happiness for husbands and wives. Harold Shryock. Washington, 
e s Review and Herald Publishing Association, 1949. Pp. 256. 

Rehabilitation of the handicapped. Wiliam H. Soden, Editor. New 
York: Ronald Press Co., 1949. Pp. 399. $5.00. 

Personnel management for supervisors. Claude E. Thompson. New 
York: Prentice-Hall, Inc., 1949. Pp. 208. $2.95. 

Selected writings from a connectionist’s psychology. Edward L. Thorndike. 
New York: Appleton-Century-Crofts, Inc., 1949. Pp. 370. $3.50. 

Working with people. Auren Uris and Betty Shapin. New York: The 
Maemillan‘Co., 1949. Pp.'314. Probable‘price, $3.50. 


New Books, Monographs, and Pamphlets 617 


. 


Experimental foundations of general psychology. Third edition. Willard 
L. Valentine and Delos D. Wickens. New York: Rhinehart and Co., 
Inc., 1949. Pp. 472. $3.00. 

Community under stress. Elizabeth Head Vaughan. Princeton: Prince- 
ton University Press, 1949. Pp. 160. $2.50. 

Personnel selection in the British forces. Philip E. Vernon and John B. 
Parry. London: University of London Press Ltd., 1949. Pp. 324. 
20/-net. 

Children with mental and physical handicaps. J.E. Wallace Wallin. New 
York: Prentice-Hall Inc., 1949. Pp. 549. $5.00. 

Out-of-school vocational guidance. Roswell Ward. New York: Harper 
and Brothers, 1949. Pp. 155. $2.50. 

Constructing classroom examinations. Ellis Weitzman and Walter J. 
McNamara. Chicago: Science Research Associates, 1949. Pp. 153. 
$3.00. 

Supervisory training—why, what, how. ILIR Publications, Series A, Vol. 
3, No. 3, Urbana: Institute of Labor and Industrial Relations, Uni- 
versity of Illinois, 1949. Pp. 24. Gratis. 

Recruitment, selection, and indoctrination of female clerical employees. 
Personnel and Training Series Research Project Report No. 4. New 
York: Scott Foresman and Co., Miss Edith Harper, 1949. Pp. 35. 
$2.00. 

Sales manager's handbook. Chicago: The Dartnell Corporation, 1949. 
Pp. 1150. $10.00. 

Training for tomorrow: proceedings of the Third Annual Training Confer- 
ence of Educational Directors in Industry and Commerce. Montreal: 
Canadian Industrial Trainers’ Association, 1949. Pp. 123. $2.00. 

Executive personality and job success. New York: American Management 
Association, 1948. Pp. 35. $.75. { 

Building worker interest in production problems. New York: American 
Management Association, 1949. Pp. 34. $.75. ; 

Management’s role in industrial mobilization. New York: American 
Management Association, 1948. Pp. 27. $.50. i 

Organization controls and executive compensation. New York: American 
Management Association, 1948. Pp. 54. $1.00. 4 

Personnel functions and the line organization. New York: American 
Management Association, 1948. Pp. 31. $.75. A 

Worker morale and productivity. New York: American Management 


Association, 1948. Pp. 38. $.75. 


REPRINTS AND ARTICLES 


of the 


American Psychological. Association 
1515 Massachusetts Ave., Northwest, Washington 5, D. C. 


Johnny Rocco, by Jean Evans. Second reprinting. 
From the Journal of Abnormal and Social 
Psychology, July, 1948. One for 25¢; 50 for 
$10.00. 


A New Readability Yardstick, by Rudolf Flesch. 
Second reprinting. From the Journal of Ap- 

~ plied Psychology, June, 1948. One for 25¢; 50 
for $5.00. 


Tables for Use with the Flesch Readability Formu- 
las, by J. N. Farr and James J. Jenkins. From 
the Journal of Applied Psychology, June, 1949. 
One for 25¢; 50 for $10.00. 


Stipends for Graduate Students in Psychology: 
1949-50, by the APA office staff. From the 
i ag Psychologist, January, 1949. One 
or 25¢. 


Available Internships in Psychology, by Helen 
Wolfe. From the American Psychologist, 
February, 1949. One for 25¢. 


~ Occupations in Psychology, by Carroll L. Shartle. 
The December 1946 issue of the American 
Psychologist. One for 50¢; 100 for $35.00. 


The Psychological and Social Sciences in the Na- 
tional Military Establishment, by Lyle H. 
Lanier. The May 1949 issue of the American 
Psychologist. One for 75¢. : 


